Documentation update (#102)

* README update to include Baylor counts * updated drop installation command * update install command docs to include conda-forge plus better descriptions Co-authored-by: Vicente <yepez@in.tum.de> Co-authored-by: Michaela Müller <mi.mueller@tum.de> Co-authored-by: Christian Mertes <mertes@in.tum.de>
gagneurlab · Aug 2, 2020 · 966da08 · 966da08
1 parent 8679afe
commit 966da08
Show file tree

Hide file tree

Showing 5 changed files with 99 additions and 77 deletions.
diff --git a/README.md b/README.md
@@ -8,40 +8,29 @@ The manuscript main file, supplementary figures and table can be found in the ma
 
 <img src="drop_sticker.png" alt="drop logo" width="200" class="center"/>
 
-## Installation
-DROP is available on [bioconda](https://anaconda.org/bioconda/drop) for python 3.6 and above.
-We recommend using a dedicated conda environment.
-
+## Quickstart
+DROP is available on [bioconda](https://anaconda.org/bioconda/drop).
+We recommend using a dedicated conda environment. (installation time: ~ 10min)
 ```
-# create environment
-conda create -n drop_env python=3.6
-conda activate drop_env
-
-# install drop
-conda install -c bioconda drop
+conda install -c conda-forge -c bioconda drop
 ```
-Installation time: ~ 10min
-
-Test whether the pipeline runs through by setting up the demo dataset in an empty directory (e.g. ``~/drop_demo``).
 
+Test installation with demo project
 ```
 mkdir ~/drop_demo
 cd ~/drop_demo
-
-# demo will download the necessary data and pipeline files
 drop demo
 ```
 
-The pipeline can be run using `snakemake` commands
-
+The pipeline can be run using [snakemake](https://snakemake.readthedocs.io/) commands
 ```
 snakemake -n # dryrun
-snakemake
+snakemake --cores 1
 ```
 
 Expected runtime: 25 min
 
-For more information on different installation options, check out the 
+For more information on different installation options, refer to the
 [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/installation.html)
 
 ## Set up a custom project
@@ -66,4 +55,8 @@ The following publicly-available datasets of gene counts can be used as controls
 
 * 119 non-strand specific fibroblasts: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3887451.svg)](https://doi.org/10.5281/zenodo.3887451)
 
+* 139 strand specific fibroblasts: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3963474.svg)](https://doi.org/10.5281/zenodo.3963474)
+
+* 125 strand specific blood: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3963470.svg)](https://doi.org/10.5281/zenodo.3963470)
+
 If you want to contribute with your own count matrices, please contact us: yepez at in.tum.de
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -3,8 +3,8 @@ DROP - Detection of RNA Outliers Pipeline
 
 DROP is intended to help researchers use RNA-Seq data in order to detect genes with aberrant expression,
 aberrant splicing and mono-allelic expression. It consists of three independent modules for each of those strategies.
-After installing DROP, the user needs to fill in the config file and sample annotation table (:ref:`prepare`).
-Then, DROP can be executed in multiple ways (:ref:`pipeline`).
+After installing DROP, the user needs to fill in the config file and sample annotation table (:doc:`prepare`).
+Then, DROP can be executed in multiple ways (:doc:`pipeline`).
 
 .. toctree::
    :maxdepth: 2
@@ -19,23 +19,28 @@ Then, DROP can be executed in multiple ways (:ref:`pipeline`).
 Quickstart
 -----------
 
-DROP is available on `bioconda <https://anaconda.org/bioconda/drop>`_ for python 3.6 and above.
-We recommend using a dedicated conda environment.
+DROP is available on `bioconda <https://anaconda.org/bioconda/drop>`_.
+We recommend using a dedicated conda environment. (installation time: ~ 10min)
 
 .. code-block:: bash
 
-    conda install -c bioconda drop
+    conda install -c conda-forge -c bioconda drop
 
-Initialize project
+Test installation with demo project
 
 .. code-block:: bash
 
-    cd <path-to-project>
+    mkdir ~/drop_demo
+    cd ~/drop_demo
     drop demo
 
-Call the pipeline
+The pipeline can be run using `snakemake <https://snakemake.readthedocs.io/>`_ commands
 
 .. code-block:: bash
 
-    snakemake
+    snakemake -n # dryrun
+    snakemake --cores 1
 
+Expected runtime: 25 min
+
+For more information on different installation options, refer to :doc:`installation`.
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -1,17 +1,18 @@
 Installation
 ============
 
-DROP is available on `bioconda <https://anaconda.org/bioconda/drop>`_ for python 3.6 and above.
-We recommend using a dedicated conda environment.
+DROP is available on `bioconda <https://anaconda.org/bioconda/drop>`_ .
+In case the conda channel priority is set to ``strict``, it should be reset to ``flexible``:
 
-.. code-block:: bash
+.. code-block::
+
+    conda config --set channel_priority true
 
-    # create environment
-    conda create -n drop_env python=3.6
-    conda activate drop_env
+We recommend using a dedicated conda environment (here: ``drop_env``) for installing drop.
+
+.. code-block:: bash
 
-    # install drop
-    conda install -c bioconda drop
+    conda create -n drop_env -c conda-forge -c bioconda drop
 
 Installation time: ~ 10min
 
@@ -25,12 +26,12 @@ Test whether the pipeline runs through by setting up the demo dataset in an empt
     # demo will download the necessary data and pipeline files
     drop demo
 
-The pipeline can be run using ``snakemake`` commands
+The pipeline can be run using `snakemake <snakemake.readthedocs.io/>`_ commands
 
 .. code-block:: bash
 
     snakemake -n # dryrun
-    snakemake
+    snakemake --cores 1
 
 Initialize a project
 --------------------
@@ -39,12 +40,13 @@ Alternatively, a new DROP project can be set up using ``drop init``.
 
 .. code-block:: bash
 
-    cd <path-to-project>
+    cd <path/to/project>
     drop init
 
 This will create an empty ``config.yaml`` file that needs to be filled according to the project data.
 You also need to prepare a sample annotation file.
-Go to :ref:`prepare` for more details.
+Go to :doc:`prepare` for more details.
+
 
 .. _otherversions:
 
@@ -53,62 +55,83 @@ Other DROP versions
 
 The developer version of DROP can be found in the `repository <https://github.com/gagneurlab/drop>`_ under the branch
 ``dev``.
-Make sure that the :any:`dependencies` are installed.
+Make sure that the :any:`prerequisites` are installed, preferably in a conda environment.
+Then install DROP from github using ``pip``.
 
 .. code-block:: bash
 
-    # activate your python environment if you are using one, e.g. drop_env
-    conda activate drop_env
+    pip install git+https://github.com/gagneurlab/drop.git@dev
+
 
-Then install DROP from github using ``pip``.
-For this recursively clone the repository with all its submodules and then install from directory.
+Alternatively, you can clone the desired branch of the repository and install from directory.
 
 .. code-block:: bash
 
-    git clone -b dev https://github.com/gagneurlab/drop.git --recurse-submodules
+    git clone -b dev https://github.com/gagneurlab/drop.git
     pip install ./drop
 
-Alternatively, you can also install it directly without cloning
+If the package needs to be updated frequently, it is more useful to use the ``-e` option of ``pip``.
+Any new update pulled from the repository will be available without reinstall.
+Note, that this requires an explicit call to update any existing project (:any:`dropUpdate`).
 
-.. code-block:: bash
+.. code-block::
 
-    pip install git+https://github.com/gagneurlab/drop.git@dev
+    pip install -e ./drop
 
-.. _dependencies:
+    # update project directory
+    cd <path/to/project>
+    drop update
 
-Dependencies
-------------
-The easiest way to ensure that all dependencies are installed is to install the
-`bioconda package <https://anaconda.org/bioconda/drop>`_ into a conda environment.
 
-.. code-block:: bash
+.. _prerequisites:
 
-    conda install -c bioconda drop
+Prerequisites
+-------------
 
-Other versions of drop can be installed after the bioconda package has been installed.
+The easiest way to ensure that all dependencies are installed is to install the bioconda package, as described above.
+Once the environment is set up and installation was successful, other versions of drop can be installed with ``pip``,
+overwriting the conda version of ``DROP`` (see :any:`otherversions`).
 
 
 Installation without conda
 ++++++++++++++++++++++++++
 Alternatively, DROP can be installed without ``conda``. In this case the following dependencies must be met:
 
-* python >= 3.6
-    * pip >= 19.1
-* `samtools <https://www.htslib.org/download/>`_ >= 1.7
-* `bcftools <https://github.com/samtools/bcftools>`_ >= 1.7
-* `tabix <https://www.htslib.org/download/>`_
-* `GATK <https://software.broadinstitute.org/gatk/>`_
-* `graphviz <https://www.graphviz.org/>`_
-* `pandoc <https://pandoc.org/>`_
-* `R <https://www.r-project.org/>`_ >= 3.5 and corresponding `bioconductor <https://bioconductor.org/install/>`_ version
-
-If you are using an already existing R installation, make sure that the R and ``bioconductor`` versions match.
-Otherwise, use the newest versions of R and bioconductor.
-The necessary R packages will be installed with the first pipeline call.
+* Programming languages:
+
+  * `python <https://www.python.org/>`_ >= 3.6 and `pip <https://pip.pypa.io/en/stable/installing/>`_ >= 19.1
+
+  * `R <https://www.r-project.org/>`_ >= 3.6 and corresponding `bioconductor <https://bioconductor.org/install/>`_ version
+
+* Commandline tools:
+
+    * `GNU bc <https://www.gnu.org/software/bc/>`_
+
+    * `GNU wget <https://www.gnu.org/software/wget/>`_
+
+    * `tabix <https://www.htslib.org/download/>`_
+
+    * `samtools <https://www.htslib.org/download/>`_ >= 1.7
+
+    * `bcftools <https://github.com/samtools/bcftools>`_ >= 1.7
+
+    * `GATK <https://software.broadinstitute.org/gatk/>`_ >= 4.0.4
+
+    * `graphviz <https://www.graphviz.org/>`_
+
+    * `pandoc <https://pandoc.org/>`_
+
+
+.. note::
+
+    If you are using an already existing R installation, make sure that the R and bioconductor versions match.
+    Otherwise, use the newest versions of R and bioconductor.
+
+At first invocation, all necessary R packages will be installed with the first pipeline call.
 As this is a lengthy process, it might be desirable to install them in advance, if a local copy of the repository exists.
 
 .. code-block:: bash
 
     # optional
-    Rscript <path-to-drop-repo>/drop/installRPackages.R drop/requirementsR.txt
+    Rscript <path/to/drop/repo>/drop/installRPackages.R drop/requirementsR.txt
 
diff --git a/docs/source/pipeline.rst b/docs/source/pipeline.rst
@@ -1,5 +1,3 @@
-.. _pipeline:
-
 Pipeline Commands
 =================
 
@@ -81,10 +79,13 @@ While running, Snakemake *locks* the directory. If, for a whatever reason, the p
 
 to unlock it. This will call snakemake's ``unlock`` command for every module
 
+.. _dropUpdate:
 
 Updating DROP
 +++++++++++++
-Everytime a project is initialized, a temporary folder ``.drop`` will be created in the project folder. If a new version of drop is installed, the ``.drop`` folder has to be updated for each project that has been initialized using an older version.
+Every time a project is initialized, a temporary folder ``.drop`` will be created in the project folder.
+If a new version of drop is installed, the ``.drop`` folder has to be updated for each project that has been
+initialized using an older version.
 To do this run:
 
 .. code-block:: bash

diff --git a/docs/source/prepare.rst b/docs/source/prepare.rst
@@ -1,5 +1,3 @@
-.. _prepare:
-
 Preparing the Input Data
 ========================
 
@@ -95,6 +93,7 @@ groups                        list       Same as in aberrant expression.
 minIds                        numeric    Same as in aberrant expression.                                                               ``1``
 recount                       boolean    If true, it forces samples to be recounted.                                                   ``false``
 longRead                      boolean    Set to true only if counting Nanopore or PacBio long reads.                                   ``false``
+keepNonStandardChrs           boolean    Set to true if non standard chromosomes are to be kept for further analysis.                  ``true``                        
 filter                        boolean    If false, no filter is applied. We recommend filtering.                                       ``true``
 minExpressionInOneSample      numeric    The minimal read count in at least one sample required for an intron to pass the filter.      ``20``
 minDeltaPsi                   numeric    The minimal variation (in delta psi) required for an intron to pass the filter.               ``0.05``
@@ -118,6 +117,7 @@ padjCutoff             numeric    Same as in aberrant expression.
 allelicRatioCutoff     numeric    A number between [0.5, 1) indicating the maximum allelic ratio allele1/(allele1+allele2) for the test to be significant.  ``0.8``
 addAF                  boolean    Whether or not to add the allele frequencies from gnomAD                                                                  ``true``
 maxAF                  numeric    Maximum allele frequency (of the minor allele) cut-off. Variants with AF equal or below this number are considered rare.  ``0.001``
+maxVarFreqCohort       numeric    Maximum variant frequency among the cohort.                                                                               ``0.05``      
 qcVcf                  character  Full path to the vcf file used for VCF-BAM matching                                                                       ``/path/to/qc_vcf.vcf.gz``
 qcGroups               list       Same as “groups”, but for the VCF-BAM matching                                                                            ``# see aberrant expression example``
 =====================  =========  ========================================================================================================================  ======
@@ -172,7 +172,7 @@ Specifically, the number of threads allowed for a computational step can be modi
 
 .. note::
 
-    DROP needs to be installed from a local directory :ref:`otherversions` using ``pip install -e <path-to-drop-repo>``
+    DROP needs to be installed from a local directory :ref:`otherversions` using ``pip install -e <path/to/drop-repo>``
     so that any changes in the code will be available in the next pipeline run.
     Any changes made to the R code need to be updated with ``drop update`` in the project directory.