update documentation

gagneurlab · Jul 18, 2022 · 7aee7a6 · 7aee7a6
1 parent 241ecaa
commit 7aee7a6
Show file tree

Hide file tree

Showing 5 changed files with 48 additions and 6 deletions.
diff --git a/docs/source/help.rst b/docs/source/help.rst
@@ -1,6 +1,12 @@
-Help
-====
+Troubleshooting
+===============
 
 In case you have any issues, please open an issue on `git <https://github.com/gagneurlab/drop>`_.
 
-You can also write an e-mail to yepez@in.tum.de or mumichae@in.tum.de
+A common problem is that during the ``MAE:mae_allelicCounts`` step, if the BAM file does not have the correct ``Read Groups`` documentation both the header and reads.  
+You can often identify if the BAM file is the problem by using the command ``gatk ValidateSamFile -I path/to/bam_file.bam``
+
+To fix this is often dependent on the individual case, but some combination of the following tools is quite helpful:  
+
+* `samtools reheader <http://www.htslib.org/doc/samtools-reheader.html>`_
+* `gatk AddOrReplaceReadGroups <https://gatk.broadinstitute.org/hc/en-us/articles/5358911906459-AddOrReplaceReadGroups-Picard->`_
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -15,7 +15,7 @@ Then, DROP can be executed in multiple ways (:doc:`pipeline`).
    pipeline
    output
    license
-   help
+   troubleshooting
 
 Quickstart
 -----------

diff --git a/docs/source/output.rst b/docs/source/output.rst
@@ -108,3 +108,38 @@ Additionally the ``mae`` module creates the following files:
     * this is the file linked in the HTML document and described above
 * ``Output/processed_results/mae/{drop_group}/MAE_results_{annotation}_rare.tsv``
     * this file is a subset of ``MAE_results_{annotation}.tsv`` with only the variants that pass the allele frequency cutoffs. If ``add_AF`` is set to ``true`` in config file must meet minimum AF set by ``max_AF``. Additionally, the inner-cohort frequency must meet the ``maxVarFreqCohort`` cutoff
+
+RNA Variant Calling
++++++++++++++++++++++++
+
+HTML file
+##########
+Looking at the resulting ``Output/html/drop_demo_index.html`` we can see the ``rnaVariantCalling`` 
+tab at the top of the screen. The Overview tab contains links to the:  
+
+* Results for each rvc batch
+    * a table summarizing the variants and genotypes that pass the variant calling filters for each sample
+        * FILTER: 
+          * ``PASS_common``: passes variant calling thresholds and fails either ``max_AF`` or ``maxVarFreqCohort`` cutoffs
+          * ``PASS_rare``: passes variant calling thresholds and config ``max_AF`` and ``maxVarFreqCohort`` cutoffs
+        * ``cohortFreq``: frequency of the variant within the batch (number of samples with the variant / total samples)
+        * ``MAX_AF``: frequency of the variant from **gnomAD** if enabled
+    * a subset table showing only the ``PASS_rare`` variants
+* Boxplot and underyling table showing the distribution of variants and the effect of various filters, split by genotype
+        * ``PASS_common``: passes variant calling thresholds and fails either ``max_AF`` or ``maxVarFreqCohort`` cutoffs
+        * ``PASS_rare``: passes variant calling thresholds and config ``max_AF`` and ``maxVarFreqCohort`` cutoffs
+        * ``Seq_filter``: fails one of the default variant calling filters
+        * ``Mask``: variant falls in a repeat/mask region
+        * ``minALT``: variant passes ``Seq_filter`` but doesn't meet config ``minALT`` criteria
+* Boxplot and underyling table showing the number of variants that pass or fail the filters
+
+Local result files
+##################
+Additionally the ``rnaVariantCalling`` module creates the following output directories:
+
+* ``Output/processed_results/rnaVariantCalling/batch_vcfs``
+    * this directory contains the multi-sample vcf files for each batch
+* ``Output/processed_results/rnaVariantCalling/sample_vcfs``
+    * this directory contains the single-sample vcf files if the config value ``createSingleVCF: true``
+* ``Output/processed_results/rnaVariantCalling/data_tables``
+    * this directory contains data tables easily imported into ``R`` using ``fread('path/to/data.table.Rds')`` for each batch of vcfs
diff --git a/drop/modules/mae-pipeline/MAE/ASEReadCounter.sh b/drop/modules/mae-pipeline/MAE/ASEReadCounter.sh
@@ -86,7 +86,8 @@ then
     "" "  MAE ID: ${mae_id}" \
     "  VCF file: ${vcf_file}" \
     "  BAM file: ${bam_file}" \
-    "  FASTA file: ${fasta}"
+    "  FASTA file: ${fasta}" \
+    " Additionally the ReadGroups may be poorly formed. Please refer to https://gagneurlab-drop.readthedocs.io/en/latest/help.html for more information "
   exit 1
 fi
 

diff --git a/drop/template/config.yaml b/drop/template/config.yaml
@@ -83,7 +83,7 @@ rnaVariantCalling:
         - Data/1000G_phase1.snps.high_confidence.hg19.sites.chrPrefix.vcf.gz
     dbSNP: Data/00-All.vcf.gz
     repeat_mask: Data/hg19_repeatMasker_sorted.chrPrefix.bed
-    createSingleVCF: true                    
+    createSingleVCF: true
     addAF: true
     maxAF: .001
     maxVarFreqCohort: .1