Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
- adde MIP execution
  • Loading branch information
henrikstranneheim committed Apr 29, 2016
1 parent 32b78ba commit 1ab6404
Show file tree
Hide file tree
Showing 9 changed files with 123 additions and 25 deletions.
28 changes: 19 additions & 9 deletions docs/MIP_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,24 @@ data.
Overview
--------
MIP performs whole genome or target region analysis of sequenced single-end and/or paired-end
reads from the Illumina plattform in fastq(.gz) format to generate annotated
reads from the Illumina platform in fastq(.gz) format to generate annotated
ranked potential disease causing variants.
MIP performs QC, alignment, coverage analysis, variant discovery and
annotation, sample checks as well as ranking the found variants according to disease potential
with a minimum of manual intervention. MIP is compatible with `Scout`_ for visualization of
identified variants.
with a minimum of manual intervention. MIP is compatible with `Scout`_ and `Puzzle`_ for visualization of
identified variants.
MIP has been in use in the clinical production at the Clinical Genomics facility at Science for
Life Laboratory since 2014.

Features
--------
- Installation
* Simple install of all programs using conda/SHELL via supplied install script
- Autonomous
* Checks that all dependencies are fulfilled before launching
* Builds/downloads references and/or files missing before launching
* Decompose and normalise references and variant vcf
* Splits and merges files for samples and families when relevant
* Builds/Prepares/downloads references and/or files missing before launching
* Decompose and normalise reference(s) and variant vcf(s)
* Splits and merges files/contigs for samples and families when relevant
- Automatic
* A minimal amount of hands-on time
* Tracks and executes all module without manual intervention
Expand All @@ -33,27 +37,32 @@ Features
* Simulate your analysis before performing it
* Redirect each modules analysis process to a temporary directory (@nodes or @login)
* Limit a run to a specific set of genomic intervals
* Use multiple variant callers and annotation programs
* Optionally split data into clinical variants and research variants
- Fast
* Analyses an exome trio in approximately 6 h
* Analyses a genome in approximately 35 h
* Analyses an exome trio in approximately 4 h
* Analyses a X-ten sequenced genome in approximately 21 h
* Rapid mode analyzes a WGS sample in approximately 4 h using a data reduction and parallelization scheme
- Traceability
* Recreate your analysis from the MIP log or generated config files
* Logs sample meta-data and sequence meta-data
* Logs version numbers of softwares and databases
* Checks sample integrity (sex and relationship)
* Test data output existens and integrity using automated tests
- Annotation
* Gene annotation
* Summarise over all transcript and output on gene level
* Transcript level annotation
* Separate pathogenic transcripts for correct downstream annotation
* Annotate all alleles for a position
* Split multi-allelic records into single records to ease annotation
* Annotate coverage across genetic region using Chanjo
* Extracts QC-metrics and stores them in YAML format
- Standardized
* Use standard formats whenever possible
- Visualization
* Ranks variants according to pathogenic potential
* Output is directly compatibel with Scout
* Output is directly compatibel with Scout and Puzzle


Example Usage
Expand Down Expand Up @@ -205,6 +214,7 @@ This is an example of a workflow that MIP can perform (used @CMMS).


.. _Scout: https://github.com/Clinical-Genomics/scout
.. _Puzzle: https://github.com/robinandeer/puzzle
.. _PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
.. _Mosaik: https://github.com/wanpinglee/MOSAIK
.. _BWA: http://bio-bwa.sourceforge.net/
Expand Down
84 changes: 84 additions & 0 deletions docs/change_log.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,89 @@
Change Log
===========

MIP v2.4 --> v2.6

- Updated GATK to 3.5
- Added static binning capability for base recalibration (BQSR)
- Added option --disable_indel_quals to BSQR
- Added limit for exomes to only use target bases in recalibration
- Added MTAF to SnpEff and vcfParser for MT frequency annotations
- Added 'trio' detection to parameters instead of scriptParameters to avoid writing key to config
- Fixed bug when supplying -sambambaDepthCutOffs on cmd
- MIP now handles updating to absolute path for comma separated parameters correctly
- Removed write to cmd string in mip log for some internal parameters
- Updated install script
- Added PIP to the condo env upon creation
- Add check that condo is executable in system before launching rest of installation
- Install script can now detect existing condo env and change cmd to accommodate
- Added sambamba (0.5.9), vt (2015.11.10), bedtools (2.25.0), htslib (1.2.1) to bioconda install
- Added option to prefer Bioconda install over shell for overlapping modules
- Added soft link creation sub routine
- Use soft link sub for sambamba (both bioconda and Shell)
- Add soft link to snpEff och SnpSift for bioconda install
- Update FASTQC to 11.4 via bioconda
- Updated SnpEff to v4_2 via Shell
- Updated Plink to v1.90b3.26 64-bit (26 Nov 2015) via shell
- Updated vcfTools to 0.1.14 via SHELL
- Updated Chanjo to 3.1.1 via PIP
- Updated Genmod to 3.4 via PIP
- Updated Picardtools to 1.141 via bioconda
- Updated Samtools to 1.3
- Updated bcfTools to 1.3
- Updated htslib to 1.3
- Added picardTools installation via SHELL
- Updated VEP to 83 via SHELL
- Trouble with distribution - htslib and sereal (only issues with testing and not with actual running the script)
- Added installation of VEP plugin UpDownDistance
- Added use of VEP plugin UpDownDistance for MT contig only to avoid over annotation of the compact MT genome
- Added padding to 10 nucleotides for MT in Vcfparser
- Added test for undetermined in fastq file name and adjust qc-test to skip entirely for these reads
- Added samtools mpileup
- Added GATKCombineVariants to combine variants calls from multiple variant callers
- Added generalisation for supporting multiple variant callers in MIP dependencies and GATKCombinaVariants
- Added no-fail to sample check
- Modified installation of picardTools and SnpEff
- Add filtering to variant calls from samtools mpileup
- Add samtools/bcfTools versions
- Add removal of samtools pileup files
- Added test::Harness for TAP summary results and future inclusion of additional test scripts
- Add option to determine priority in variant callers as comma sep string
- Add check of variant callers active compared to prioritise flag
- Add sanity check of prioritisation flag
- Add option to turn on or off installation of programs in install.pl
- Added bcf file compression and indexing as sub
- Added vcfTobcf sub to GATKCombineVariants
- Switched vcf ready file from GATKVariantRecalibration to GATKCombineVariants
- Added Freebayes variant caller
- Added to removeRedundantFiles
- Added Freebayes version to qcCollect
- RemoveRedundant files info is now recorded in definition.yaml
- Added GATKCombineVariants to removeRedundant files
- Add bcftools norm to samtools pileup and freebayes output
- Add lastlogFilePath to qc_sampleInfo
- Made lanes and readDirections info more nested
- Add 1000G Phase 3 and Exac to Genmod annotation
- Changed regEx in test.t to include all until “,” for INFO fields in Header
- Modified bioconda softlinks sub call to only execute if programs are installed
- Added MT.codon table sub for snpEff config to install script
- Remake GENMOD CADD file option to array
- Added padded target intervals to exome analysis again for GATKRealign and GATKHaplotypeCaller
- Reactivate GATKPaddedTarget parameter
- Made associatedPrograms arg into an array instead of a comma sep string
- Fixed check for when a capture kits is lacking from input and fallback to using “latest”
- Remade CheckParameterFiles to work with DataType
- Add evaluation with NIST as a module in MIP
- Fix the . mip.sh to bash mip.sh
- Added reference to define/definitions
- CheckParameterFiles now works with parameterExistsCheck directly instead of “d” and “f” enabling merge of directory and file sections
- Changed if for intervalListFile to be if($IntervalList ) instead of analysisTypeExome|rapid
- Add programType=Aligner to define/definitions
- Remade sanity check of aligner to count if more than 1 aligner has been switched on (MosaikAligner, BWASampe, BWAMEM)
- Dynamic setting of ‘aligner’ depending in aligner supplied by outDirectoryName
- Renamed aligner to alignerOutDir
- Added genmod max_af
- Added canonical to VEP features

MIP v2.0 --> v2.4

- Bugfixes
Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@
# built documents.
#
# The short X.Y version.
version = '2.4'
version = '2.7'
# The full version, including alpha/beta/rc tags.
release = '2.4.0'
release = '2.7.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
1 change: 0 additions & 1 deletion docs/configuration_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ Entries in the configuration file containing the following "dynamic strings" wil
* ANALYSISCONSTANTPATH! = 'analysisConstantPath: {value}'
* ANALYSISTYPE! = 'analysisType: {value}'
* FDN! = '-f familyID' (from command line)
* IDN! = '-s sampleIDs' (from command line), configuration file or supplied pedigree file

For instance, the pedigree file entry in the configuration file can be supplied like this:

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Contents:
change_log
installation
setup
MIP_execution
adding-new-programs
structure
vcfParser
Expand Down
2 changes: 1 addition & 1 deletion docs/pedigree_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Pedigree capture kits aliases
* Agilent_SureSelect.V4 => Agilent_SureSelect.V4.GenomeReferenceSourceVersion_targets.bed
* Agilent_SureSelect.V5 => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
* Agilent_SureSelectCRE.V1 => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
* Latest => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
* Latest => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed

* NimbleGen

Expand Down
24 changes: 14 additions & 10 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,24 +26,26 @@ are tested for compatibility with MIP.
DateTime::Format::HTTP, DateTime::Format::Mail, Set::IntervalTree from CPAN, since these
are not included in the perl standard distribution
- Simple Linux Utility for Resource Management (`SLURM`_)
- `FastQC`_ (version: 0.11.2)
- `FastQC`_ (version: 0.11.4)
- `Mosaik`_ (version: 2.2.24)
- `BWA`_ (version: 0.7.12)
- `Sambamba`_ (version: 0.5.9)
- `SAMTools`_ (version: 1.2)
- `SAMTools`_ (version: 1.3)
- `BedTools`_ (version: 2.25.0)
- `PicardTools`_ (version: 1.139)
- `Chanjo`_ (version: 3.1.0)
- `GATK`_ (version: 3.4-46)
- `PicardTools`_ (version: 2.0.1)
- `Chanjo`_ (version: 3.3.0)
- `GATK`_ (version: 3.5-0)
- `freebayes`_ (version: 1.2)
- `VT`_ (version: 0.5)
- `VEP`_ (version: 82)
- `VEP`_ (version: 83) with plugin "UpDownDistance"
- vcfParser.pl (Supplied with MIP; see :doc:`vcfParser`)
- `SnpEff`_ (4.1)
- `SnpEff`_ (4.2)
- `ANNOVAR`_ (version: 2013-08-23)
- `GENMOD`_ (version: 3.4.0)
- `VcfTools`_ (version: 0.1.12b)
- `BcfTools`_ (version: 1.1)
- `GENMOD`_ (version: 3.4.8)
- `VcfTools`_ (version: 0.1.14)
- `BcfTools`_ (version: 1.3)
- `PLINK`_ (version: 1.90b3x)
- `MultiQC`_ (version: 0.5)

Depending on what programs you include in the MIP analysis you also need to add
these programs to your ``$PATH``:
Expand Down Expand Up @@ -133,6 +135,7 @@ directory using Annovars built-in download function.
.. _PicardTools: http://picard.sourceforge.net/
.. _Chanjo: https://chanjo.readthedocs.org/en/latest/
.. _GATK: http://www.broadinstitute.org/gatk/
.. _freebayes: https://github.com/ekg/freebayes
.. _VT: https://github.com/atks/vt
.. _VEP: http://www.ensembl.org/info/docs/tools/vep/index.html
.. _SnpEff: http://snpeff.sourceforge.net/
Expand All @@ -142,6 +145,7 @@ directory using Annovars built-in download function.
.. _VcfTools: http://vcftools.sourceforge.net/
.. _BcfTools: https://samtools.github.io/bcftools/bcftools.html
.. _PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
.. _MultiQC: https://github.com/ewels/MultiQC
.. _Cosmid: https://github.com/robinandeer/cosmid
.. _Tabix: http://samtools.sourceforge.net/tabix.shtml
.. _pyenv: https://github.com/yyuu/pyenv
Expand Down
2 changes: 1 addition & 1 deletion docs/tables/pedigree_file_optional_columns.csv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ColumnName,Default Value,Type,SummaryCMMSID,Na,``String``,The clinics identification number for the individualTissue_origin,Na,``String``,Tissue of Isolation (DNA/RNA)Isolation_kit,Na,``String``,Kit used to isolate nucleic acidsIsolation_date,Na,``Integer``,Date of performing isolation of nucleic acidsIsolation_personnel,Na,``String``,Personnel performing isolation of nucleic acidsMedical_doctor,Na,``String``,Responsible clinician(s)Inheritance_model,Na,``String``,Probable disease genetic model inheritance within pedigreePhenotype_terms,Na,``String``,Phenotypic terms associated with the disorderCMMS_seqID,Na,``String``,Batch identificationSciLifeID,Na,``String``,ScilifeLab identification Capture_kit,Na,``String``,Capture kit used in library preparationCapture_date,Na,``Integer``,Date of performing capture procedureCapture_personnel,Na,``String``,Personnel performing capture procedureClustering_date,Na,``Integer``,Date of clusteringSequencing_kit,Na,``String``,Sequencing kitClinical_db,dbCMMS,``String``,The clinical databaseClinical_db_gene_annotation,IEM,``String``,Genes associated with a disease group within the clinical database
ColumnName,Default Value,Type,SummaryCMMSID,Na,``String``,The clinics identification number for the individualTissue_origin,Na,``String``,Tissue of Isolation (DNA/RNA)Isolation_kit,Na,``String``,Kit used to isolate nucleic acidsIsolation_date,Na,``Integer``,Date of performing isolation of nucleic acidsIsolation_personnel,Na,``String``,Personnel performing isolation of nucleic acidsMedical_doctor,Na,``String``,Responsible clinician(s)Inheritance_model,Na,``String``,Probable disease genetic model inheritance within pedigreePhenotype_terms,Na,``String``,Phenotypic terms associated with the disorderCMMS_seqID,Na,``String``,Batch identificationSciLifeID,Na,``String``,ScilifeLab identification Capture_kit,Na,``String``,Capture kit used in library preparationCapture_date,Na,``Integer``,Date of performing capture procedureCapture_personnel,Na,``String``,Personnel performing capture procedureClustering_date,Na,``Integer``,Date of clusteringSequencing_kit,Na,``String``,Sequencing kitClinical_db,dbCMMS,``String``,The clinical databaseClinical_db_gene_annotation,IEM,``String``,Genes associated with a disease group within the clinical databaseSequencing_type,Na,``String``,Type of sequencing performed
Expand Down
2 changes: 1 addition & 1 deletion docs/vcfParser.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
vcfParser
==========
=========
Parses vcf files to reformat/add INFO fields and metaData headers and/or select entries
belonging to a subgroup e.g. a list of genes. Input can be piped or supplied as an infile.

Expand Down

0 comments on commit 1ab6404

Please sign in to comment.