Updated docs

- adde MIP execution
henrikstranneheim · Apr 29, 2016 · 1ab6404 · 1ab6404
1 parent 32b78ba
commit 1ab6404
Show file tree

Hide file tree

Showing 9 changed files with 123 additions and 25 deletions.
diff --git a/docs/MIP_overview.rst b/docs/MIP_overview.rst
@@ -6,20 +6,24 @@ data.
 Overview
 --------
 MIP performs whole genome or target region analysis of sequenced single-end and/or paired-end
-reads from the Illumina plattform in fastq(.gz) format to generate annotated
+reads from the Illumina platform in fastq(.gz) format to generate annotated
 ranked potential disease causing variants. 
 MIP performs QC, alignment, coverage analysis, variant discovery and
 annotation, sample checks as well as ranking the found variants according to disease potential
-with a minimum of manual intervention. MIP is compatible with `Scout`_ for visualization of
-identified variants. 
+with a minimum of manual intervention. MIP is compatible with `Scout`_ and `Puzzle`_ for visualization of
+identified variants.
+MIP has been in use in the clinical production at the Clinical Genomics facility at Science for 
+Life Laboratory since 2014.
 
 Features
 --------
+ - Installation
+ 	* Simple install of all programs using conda/SHELL via supplied install script 
  - Autonomous
  	* Checks that all dependencies are fulfilled before launching
- 	* Builds/downloads references and/or files missing before launching	
- 	* Decompose and normalise references and variant vcf
- 	* Splits and merges files for samples and families when relevant
+ 	* Builds/Prepares/downloads references and/or files missing before launching	
+ 	* Decompose and normalise reference(s) and variant vcf(s)
+ 	* Splits and merges files/contigs for samples and families when relevant
  - Automatic
 	* A minimal amount of hands-on time
  	* Tracks and executes all module without manual intervention
@@ -33,27 +37,32 @@ Features
  	* Simulate your analysis before performing it
  	* Redirect each modules analysis process to a temporary directory (@nodes or @login)
  	* Limit a run to a specific set of genomic intervals
+ 	* Use multiple variant callers and annotation programs
+ 	* Optionally split data into clinical variants and research variants
  - Fast
- 	* Analyses an exome trio in approximately 6 h
- 	* Analyses a genome in approximately 35 h
+ 	* Analyses an exome trio in approximately 4 h
+ 	* Analyses a X-ten sequenced genome in approximately 21 h
  	* Rapid mode analyzes a WGS sample in approximately 4 h using a data reduction and parallelization scheme
  - Traceability
  	* Recreate your analysis from the MIP log or generated config files
  	* Logs sample meta-data and sequence meta-data
  	* Logs version numbers of softwares and databases
  	* Checks sample integrity (sex and relationship)
+ 	* Test data output existens and integrity using automated tests
  - Annotation
  	* Gene annotation
  		* Summarise over all transcript and output on gene level
  	* Transcript level annotation
  		* Separate pathogenic transcripts for correct downstream annotation
  	* Annotate all alleles for a position
  		* Split multi-allelic records into single records to ease annotation
+ 	* Annotate coverage across genetic region using Chanjo
+ 	* Extracts QC-metrics and stores them in YAML format
  - Standardized
  	* Use standard formats whenever possible
  - Visualization
   	* Ranks variants according to pathogenic potential
- 	* Output is directly compatibel with Scout
+ 	* Output is directly compatibel with Scout and Puzzle
 
 
 Example Usage
@@ -205,6 +214,7 @@ This is an example of a workflow that MIP can perform (used @CMMS).
 
 
 .. _Scout: https://github.com/Clinical-Genomics/scout
+.. _Puzzle: https://github.com/robinandeer/puzzle
 .. _PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
 .. _Mosaik: https://github.com/wanpinglee/MOSAIK
 .. _BWA: http://bio-bwa.sourceforge.net/

diff --git a/docs/change_log.rst b/docs/change_log.rst
@@ -1,5 +1,89 @@
 Change Log
 ===========
+
+MIP v2.4 --> v2.6
+
+- Updated GATK to 3.5
+- Added static binning capability for base recalibration (BQSR)
+- Added option --disable_indel_quals to BSQR
+- Added limit for exomes to only use target bases in recalibration
+- Added MTAF to SnpEff and vcfParser for MT frequency annotations
+- Added 'trio' detection to parameters instead of scriptParameters to avoid writing key to config
+- Fixed bug when supplying -sambambaDepthCutOffs on cmd
+- MIP now handles updating to absolute path for  comma separated parameters correctly
+- Removed write to cmd string in mip log for some internal parameters
+- Updated install script
+	- Added PIP to the condo env upon creation
+	- Add check that condo is executable in system before launching rest of installation
+	- Install script can now detect existing condo env and change cmd to accommodate 
+	- Added sambamba (0.5.9), vt (2015.11.10), bedtools (2.25.0), htslib (1.2.1) to bioconda install
+	- Added option to prefer Bioconda install over shell for overlapping modules
+	- Added soft link creation sub routine
+	- Use soft link sub for sambamba (both bioconda and Shell) 
+	- Add soft link to snpEff och SnpSift for bioconda install
+- Update FASTQC to 11.4 via bioconda
+- Updated SnpEff to v4_2 via Shell
+- Updated Plink to v1.90b3.26 64-bit (26 Nov 2015) via shell
+- Updated vcfTools to 0.1.14 via SHELL
+- Updated Chanjo to 3.1.1 via PIP
+- Updated Genmod to 3.4 via PIP
+- Updated Picardtools to 1.141 via bioconda
+- Updated Samtools to 1.3
+- Updated bcfTools to 1.3
+- Updated htslib to 1.3
+- Added picardTools installation via SHELL
+- Updated VEP to 83 via SHELL
+	- Trouble with distribution - htslib and sereal (only issues with testing and not with actual running the script)
+	- Added installation of VEP plugin UpDownDistance
+- Added use of VEP plugin UpDownDistance for MT contig only to avoid over annotation of the compact MT genome
+- Added padding to 10 nucleotides for MT in Vcfparser
+- Added test for undetermined in fastq file name and adjust qc-test to skip entirely for these reads
+- Added samtools mpileup 
+- Added GATKCombineVariants to combine variants calls from multiple variant callers
+- Added generalisation for supporting multiple variant callers in MIP dependencies and GATKCombinaVariants
+- Added no-fail to sample check
+- Modified installation of picardTools and SnpEff
+- Add filtering to variant calls from samtools mpileup
+- Add samtools/bcfTools versions 
+- Add removal of samtools pileup files
+- Added test::Harness for TAP summary results and future inclusion of additional test scripts
+- Add option to determine priority in variant callers as comma sep string
+- Add check of variant callers active compared to prioritise flag
+- Add sanity check of prioritisation flag
+- Add option to turn on or off installation of programs in install.pl
+- Added bcf file compression and indexing as sub
+- Added vcfTobcf sub to GATKCombineVariants
+- Switched vcf ready file from GATKVariantRecalibration to GATKCombineVariants
+- Added Freebayes variant caller
+	- Added to removeRedundantFiles
+	- Added Freebayes version to qcCollect
+- RemoveRedundant files info is now recorded in definition.yaml
+- Added GATKCombineVariants to removeRedundant files
+- Add bcftools norm to samtools pileup and freebayes output
+- Add lastlogFilePath to qc_sampleInfo
+- Made lanes and readDirections info more nested
+- Add 1000G Phase 3 and Exac to Genmod annotation
+- Changed regEx in test.t  to include all until “,” for INFO fields in Header
+- Modified bioconda softlinks sub call to only execute if programs are installed
+- Added MT.codon table sub for snpEff config to install script
+- Remake GENMOD CADD file option to array
+- Added padded target intervals to exome analysis again for GATKRealign and GATKHaplotypeCaller
+- Reactivate GATKPaddedTarget parameter
+- Made associatedPrograms arg into an array instead of a comma sep string
+- Fixed check for when a capture kits is lacking from input and fallback to using “latest”
+- Remade CheckParameterFiles to work with DataType
+- Add evaluation with NIST as a module in MIP
+- Fix the . mip.sh to bash mip.sh
+- Added reference to define/definitions
+- CheckParameterFiles now works with parameterExistsCheck directly instead of “d” and “f” enabling merge of directory and file sections
+- Changed if for intervalListFile to be if($IntervalList ) instead of analysisTypeExome|rapid
+- Add programType=Aligner to define/definitions 
+- Remade sanity check of aligner to count if more than 1 aligner has been switched on (MosaikAligner, BWASampe, BWAMEM)
+- Dynamic setting of ‘aligner’ depending in aligner supplied by outDirectoryName
+- Renamed aligner to alignerOutDir
+- Added genmod max_af
+- Added canonical to VEP features
+
 MIP v2.0 --> v2.4
 
 - Bugfixes

diff --git a/docs/conf.py b/docs/conf.py
@@ -51,9 +51,9 @@
 # built documents.
 #
 # The short X.Y version.
-version = '2.4'
+version = '2.7'
 # The full version, including alpha/beta/rc tags.
-release = '2.4.0'
+release = '2.7.0'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/docs/configuration_file.rst b/docs/configuration_file.rst
@@ -19,7 +19,6 @@ Entries in the configuration file containing the following "dynamic strings" wil
   * ANALYSISCONSTANTPATH! = 'analysisConstantPath: {value}'
   * ANALYSISTYPE! = 'analysisType: {value}'
   * FDN! = '-f familyID' (from command line)
-  * IDN! = '-s sampleIDs' (from command line), configuration file or supplied pedigree file
 
 For instance, the pedigree file entry in the configuration file can be supplied like this:
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -21,6 +21,7 @@ Contents:
    change_log
    installation
    setup
+   MIP_execution
    adding-new-programs
    structure
    vcfParser

diff --git a/docs/pedigree_file.rst b/docs/pedigree_file.rst
@@ -49,7 +49,7 @@ Pedigree capture kits aliases
   * Agilent_SureSelect.V4 => Agilent_SureSelect.V4.GenomeReferenceSourceVersion_targets.bed
   * Agilent_SureSelect.V5 => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
   * Agilent_SureSelectCRE.V1 => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
-  * Latest => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
+  * Latest => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
 
 * NimbleGen
 

diff --git a/docs/setup.rst b/docs/setup.rst
@@ -26,24 +26,26 @@ are tested for compatibility with MIP.
   DateTime::Format::HTTP, DateTime::Format::Mail, Set::IntervalTree from CPAN, since these
   are not included in the perl standard distribution
 - Simple Linux Utility for Resource Management (`SLURM`_)
-- `FastQC`_ (version: 0.11.2)
+- `FastQC`_ (version: 0.11.4)
 - `Mosaik`_ (version: 2.2.24)
 - `BWA`_ (version: 0.7.12)
 - `Sambamba`_ (version: 0.5.9)
-- `SAMTools`_ (version: 1.2)
+- `SAMTools`_ (version: 1.3)
 - `BedTools`_ (version: 2.25.0)
-- `PicardTools`_ (version: 1.139)
-- `Chanjo`_ (version: 3.1.0)
-- `GATK`_ (version: 3.4-46)
+- `PicardTools`_ (version: 2.0.1)
+- `Chanjo`_ (version: 3.3.0)
+- `GATK`_ (version: 3.5-0)
+- `freebayes`_ (version: 1.2)
 - `VT`_ (version: 0.5)
-- `VEP`_ (version: 82)
+- `VEP`_ (version: 83) with plugin "UpDownDistance"
 - vcfParser.pl (Supplied with MIP; see :doc:`vcfParser`)
-- `SnpEff`_ (4.1)
+- `SnpEff`_ (4.2)
 - `ANNOVAR`_ (version: 2013-08-23)
-- `GENMOD`_ (version: 3.4.0)
-- `VcfTools`_ (version: 0.1.12b)
-- `BcfTools`_ (version: 1.1)
+- `GENMOD`_ (version: 3.4.8)
+- `VcfTools`_ (version: 0.1.14)
+- `BcfTools`_ (version: 1.3)
 - `PLINK`_ (version: 1.90b3x)
+- `MultiQC`_ (version: 0.5)
 
 Depending on what programs you include in the MIP analysis you also need to add
 these programs to your ``$PATH``:
@@ -133,6 +135,7 @@ directory using Annovars built-in download function.
 .. _PicardTools: http://picard.sourceforge.net/
 .. _Chanjo: https://chanjo.readthedocs.org/en/latest/
 .. _GATK: http://www.broadinstitute.org/gatk/
+.. _freebayes: https://github.com/ekg/freebayes
 .. _VT: https://github.com/atks/vt
 .. _VEP: http://www.ensembl.org/info/docs/tools/vep/index.html
 .. _SnpEff: http://snpeff.sourceforge.net/
@@ -142,6 +145,7 @@ directory using Annovars built-in download function.
 .. _VcfTools: http://vcftools.sourceforge.net/
 .. _BcfTools: https://samtools.github.io/bcftools/bcftools.html
 .. _PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
+.. _MultiQC: https://github.com/ewels/MultiQC
 .. _Cosmid: https://github.com/robinandeer/cosmid
 .. _Tabix: http://samtools.sourceforge.net/tabix.shtml
 .. _pyenv: https://github.com/yyuu/pyenv

diff --git a/docs/tables/pedigree_file_optional_columns.csv b/docs/tables/pedigree_file_optional_columns.csv
@@ -1 +1 @@
-ColumnName,Default Value,Type,SummaryCMMSID,Na,``String``,The clinics identification number for the individualTissue_origin,Na,``String``,Tissue of Isolation (DNA/RNA)Isolation_kit,Na,``String``,Kit used to isolate nucleic acidsIsolation_date,Na,``Integer``,Date of performing isolation of nucleic acidsIsolation_personnel,Na,``String``,Personnel performing isolation of nucleic acidsMedical_doctor,Na,``String``,Responsible clinician(s)Inheritance_model,Na,``String``,Probable disease genetic model inheritance within pedigreePhenotype_terms,Na,``String``,Phenotypic terms associated with the disorderCMMS_seqID,Na,``String``,Batch identificationSciLifeID,Na,``String``,ScilifeLab identification Capture_kit,Na,``String``,Capture kit used in library preparationCapture_date,Na,``Integer``,Date of performing capture procedureCapture_personnel,Na,``String``,Personnel performing capture procedureClustering_date,Na,``Integer``,Date of clusteringSequencing_kit,Na,``String``,Sequencing kitClinical_db,dbCMMS,``String``,The clinical databaseClinical_db_gene_annotation,IEM,``String``,Genes associated with a disease group within the clinical database
+ColumnName,Default Value,Type,SummaryCMMSID,Na,``String``,The clinics identification number for the individualTissue_origin,Na,``String``,Tissue of Isolation (DNA/RNA)Isolation_kit,Na,``String``,Kit used to isolate nucleic acidsIsolation_date,Na,``Integer``,Date of performing isolation of nucleic acidsIsolation_personnel,Na,``String``,Personnel performing isolation of nucleic acidsMedical_doctor,Na,``String``,Responsible clinician(s)Inheritance_model,Na,``String``,Probable disease genetic model inheritance within pedigreePhenotype_terms,Na,``String``,Phenotypic terms associated with the disorderCMMS_seqID,Na,``String``,Batch identificationSciLifeID,Na,``String``,ScilifeLab identification Capture_kit,Na,``String``,Capture kit used in library preparationCapture_date,Na,``Integer``,Date of performing capture procedureCapture_personnel,Na,``String``,Personnel performing capture procedureClustering_date,Na,``Integer``,Date of clusteringSequencing_kit,Na,``String``,Sequencing kitClinical_db,dbCMMS,``String``,The clinical databaseClinical_db_gene_annotation,IEM,``String``,Genes associated with a disease group within the clinical databaseSequencing_type,Na,``String``,Type of sequencing performed

diff --git a/docs/vcfParser.rst b/docs/vcfParser.rst
@@ -1,5 +1,5 @@
 vcfParser
-==========
+=========
 Parses vcf files to reformat/add INFO fields and metaData headers and/or select entries 
 belonging to a subgroup e.g. a list of genes. Input can be piped or supplied as an infile.