Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Smith Nicholas committed Jul 18, 2022
1 parent 241ecaa commit 7aee7a6
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 6 deletions.
12 changes: 9 additions & 3 deletions docs/source/help.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Help
====
Troubleshooting
===============

In case you have any issues, please open an issue on `git <https://github.com/gagneurlab/drop>`_.

You can also write an e-mail to yepez@in.tum.de or mumichae@in.tum.de
A common problem is that during the ``MAE:mae_allelicCounts`` step, if the BAM file does not have the correct ``Read Groups`` documentation both the header and reads.
You can often identify if the BAM file is the problem by using the command ``gatk ValidateSamFile -I path/to/bam_file.bam``

To fix this is often dependent on the individual case, but some combination of the following tools is quite helpful:

* `samtools reheader <http://www.htslib.org/doc/samtools-reheader.html>`_
* `gatk AddOrReplaceReadGroups <https://gatk.broadinstitute.org/hc/en-us/articles/5358911906459-AddOrReplaceReadGroups-Picard->`_
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Then, DROP can be executed in multiple ways (:doc:`pipeline`).
pipeline
output
license
help
troubleshooting

Quickstart
-----------
Expand Down
35 changes: 35 additions & 0 deletions docs/source/output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,38 @@ Additionally the ``mae`` module creates the following files:
* this is the file linked in the HTML document and described above
* ``Output/processed_results/mae/{drop_group}/MAE_results_{annotation}_rare.tsv``
* this file is a subset of ``MAE_results_{annotation}.tsv`` with only the variants that pass the allele frequency cutoffs. If ``add_AF`` is set to ``true`` in config file must meet minimum AF set by ``max_AF``. Additionally, the inner-cohort frequency must meet the ``maxVarFreqCohort`` cutoff

RNA Variant Calling
+++++++++++++++++++++++

HTML file
##########
Looking at the resulting ``Output/html/drop_demo_index.html`` we can see the ``rnaVariantCalling``
tab at the top of the screen. The Overview tab contains links to the:

* Results for each rvc batch
* a table summarizing the variants and genotypes that pass the variant calling filters for each sample
* FILTER:
* ``PASS_common``: passes variant calling thresholds and fails either ``max_AF`` or ``maxVarFreqCohort`` cutoffs
* ``PASS_rare``: passes variant calling thresholds and config ``max_AF`` and ``maxVarFreqCohort`` cutoffs
* ``cohortFreq``: frequency of the variant within the batch (number of samples with the variant / total samples)
* ``MAX_AF``: frequency of the variant from **gnomAD** if enabled
* a subset table showing only the ``PASS_rare`` variants
* Boxplot and underyling table showing the distribution of variants and the effect of various filters, split by genotype
* ``PASS_common``: passes variant calling thresholds and fails either ``max_AF`` or ``maxVarFreqCohort`` cutoffs
* ``PASS_rare``: passes variant calling thresholds and config ``max_AF`` and ``maxVarFreqCohort`` cutoffs
* ``Seq_filter``: fails one of the default variant calling filters
* ``Mask``: variant falls in a repeat/mask region
* ``minALT``: variant passes ``Seq_filter`` but doesn't meet config ``minALT`` criteria
* Boxplot and underyling table showing the number of variants that pass or fail the filters

Local result files
##################
Additionally the ``rnaVariantCalling`` module creates the following output directories:

* ``Output/processed_results/rnaVariantCalling/batch_vcfs``
* this directory contains the multi-sample vcf files for each batch
* ``Output/processed_results/rnaVariantCalling/sample_vcfs``
* this directory contains the single-sample vcf files if the config value ``createSingleVCF: true``
* ``Output/processed_results/rnaVariantCalling/data_tables``
* this directory contains data tables easily imported into ``R`` using ``fread('path/to/data.table.Rds')`` for each batch of vcfs
3 changes: 2 additions & 1 deletion drop/modules/mae-pipeline/MAE/ASEReadCounter.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@ then
"" " MAE ID: ${mae_id}" \
" VCF file: ${vcf_file}" \
" BAM file: ${bam_file}" \
" FASTA file: ${fasta}"
" FASTA file: ${fasta}" \
" Additionally the ReadGroups may be poorly formed. Please refer to https://gagneurlab-drop.readthedocs.io/en/latest/help.html for more information "
exit 1
fi

Expand Down
2 changes: 1 addition & 1 deletion drop/template/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ rnaVariantCalling:
- Data/1000G_phase1.snps.high_confidence.hg19.sites.chrPrefix.vcf.gz
dbSNP: Data/00-All.vcf.gz
repeat_mask: Data/hg19_repeatMasker_sorted.chrPrefix.bed
createSingleVCF: true
createSingleVCF: true
addAF: true
maxAF: .001
maxVarFreqCohort: .1
Expand Down

0 comments on commit 7aee7a6

Please sign in to comment.