Skip to content

Latest commit

 

History

History
114 lines (83 loc) · 12.9 KB

result_interpretation.rst

File metadata and controls

114 lines (83 loc) · 12.9 KB

Interpreting the Results

Depending on the output options provided, Exomiser will write out at least an HTML results file in the results sub-directory of the Exomiser installation.

As a general rule all output files contain a ranked list of genes and/or variants with the top-ranked gene/variant displayed first. The exception being the VCF output which, since version 13.1.0, is sorted according to VCF convention and tabix indexed.

Exomiser attempts to predict the variant or variants likely to be causative of a patient's phenotype and does so by associating them with the gene (or genes in the case of large structural variations) they intersect with on the genomic sequence. Variants occurring in intergenic regions are associated to the closest gene and those overlapping two genes are associated with the gene in which they are predicted to have the largest consequence.

Once associated with a gene, Exomiser uses the compatible modes of inheritance for a variant to assess it in the context of any diseases associated with the gene or any mouse knockout models of that gene. These are all bundled together into a GeneScore which features filtered variants located in that gene compatible with a given mode of inheritance. After the filtering steps Exomiser ranks these GeneScores according to descending combined score. The results are then written out to the files and formats specified in the output settings.

As of release 13.2.0 the output files only feature a single, combined output file of ranked genes/variants e.g. when supplying the output options (via an output-options.yaml file) outputFileName: Pfeiffer-hiphive-exome-PASS_ONLY and outputFormats: [TSV_VARIANT, TSV_GENE, VCF], the following files will be written out: Pfeiffer-hiphive-exome-PASS_ONLY.variants.tsv Pfeiffer-hiphive-exome-PASS_ONLY.genes.tsv, Pfeiffer-hiphive-exome-PASS_ONLY.vcf

These formats are detailed below.

HTML

JSON

The JSON file represents the most accurate representation of the data, as it is referenced internally by Exomiser. As such, we don't provide a schema for this, but it has been pretty stable and breaking changes will only occur with major version changes to the software. Minor additions are to be expected for minor releases, as per the SemVer specification.

We recommend using Python or JQ to extract data from this file.

TSV GENES

In the genes.tsv file it is possible for a gene to appear multiple times, depending on the MOI it is compatible with, given the filtered variants. For example in the example below MUC6 is ranked 7th under the AD model and 8th under an AR model.

#RANK   ID  GENE_SYMBOL ENTREZ_GENE_ID  MOI P-VALUE EXOMISER_GENE_COMBINED_SCORE    EXOMISER_GENE_PHENO_SCORE   EXOMISER_GENE_VARIANT_SCORE HUMAN_PHENO_SCORE   MOUSE_PHENO_SCORE   FISH_PHENO_SCORE    WALKER_SCORE    PHIVE_ALL_SPECIES_SCORE OMIM_SCORE  MATCHES_CANDIDATE_GENE  HUMAN_PHENO_EVIDENCE    MOUSE_PHENO_EVIDENCE    FISH_PHENO_EVIDENCE HUMAN_PPI_EVIDENCE  MOUSE_PPI_EVIDENCE  FISH_PPI_EVIDENCE
1   FGFR2_AD    FGFR2   2263    AD  0.0000  0.9981  1.0000  1.0000  0.8808  1.0000  0.0000  0.5095  1.0000  1.0000  0   Jackson-Weiss syndrome (OMIM:123150): Brachydactyly (HP:0001156)-Broad hallux (HP:0010055), Craniosynostosis (HP:0001363)-Craniosynostosis (HP:0001363), Broad thumb (HP:0011304)-Broad metatarsal (HP:0001783), Broad hallux (HP:0010055)-Broad hallux (HP:0010055),   Brachydactyly (HP:0001156)-abnormal sternum morphology (MP:0000157), Craniosynostosis (HP:0001363)-premature cranial suture closure (MP:0000081), Broad thumb (HP:0011304)-abnormal sternum morphology (MP:0000157), Broad hallux (HP:0010055)-abnormal sternum morphology (MP:0000157),        Proximity to FGF14 associated with Spinocerebellar ataxia 27 (OMIM:609307): Broad hallux (HP:0010055)-Pes cavus (HP:0001761),   Proximity to FGF14 Brachydactyly (HP:0001156)-abnormal digit morphology (MP:0002110), Broad thumb (HP:0011304)-abnormal digit morphology (MP:0002110), Broad hallux (HP:0010055)-abnormal digit morphology (MP:0002110),
2   ENPP1_AD    ENPP1   5167    AD  0.0049  0.8690  0.5773  0.9996  0.6972  0.5773  0.5237  0.5066  0.6972  1.0000  0   Autosomal recessive hypophosphatemic rickets (ORPHA:289176): Brachydactyly (HP:0001156)-Genu varum (HP:0002970), Craniosynostosis (HP:0001363)-Craniosynostosis (HP:0001363), Broad thumb (HP:0011304)-Tibial bowing (HP:0002982), Broad hallux (HP:0010055)-Genu varum (HP:0002970),   Brachydactyly (HP:0001156)-fused carpal bones (MP:0008915), Craniosynostosis (HP:0001363)-abnormal nucleus pulposus morphology (MP:0006392), Broad thumb (HP:0011304)-fused carpal bones (MP:0008915), Broad hallux (HP:0010055)-fused carpal bones (MP:0008915),   Craniosynostosis (HP:0001363)-ceratohyal cartilage premature perichondral ossification, abnormal (ZP:0012007), Broad thumb (HP:0011304)-cleithrum nodular, abnormal (ZP:0006782),   Proximity to PAPSS2 associated with Brachyolmia 4 with mild epiphyseal and metaphyseal changes (OMIM:612847): Brachydactyly (HP:0001156)-Brachydactyly (HP:0001156), Broad thumb (HP:0011304)-Brachydactyly (HP:0001156), Broad hallux (HP:0010055)-Brachydactyly (HP:0001156),     Proximity to PAPSS2 Brachydactyly (HP:0001156)-abnormal long bone epiphyseal plate morphology (MP:0003055), Craniosynostosis (HP:0001363)-domed cranium (MP:0000440), Broad thumb (HP:0011304)-abnormal long bone epiphyseal plate morphology (MP:0003055), Broad hallux (HP:0010055)-abnormal long bone epiphyseal plate morphology (MP:0003055),
//
7   MUC6_AD MUC6    4588    AD  0.0096  0.7532  0.5030  0.9990  0.0000  0.0000  0.0000  0.5030  0.5030  1.0000  0                   Proximity to GKN2 Brachydactyly (HP:0001156)-brachydactyly (MP:0002544), Broad thumb (HP:0011304)-brachydactyly (MP:0002544), Broad hallux (HP:0010055)-brachydactyly (MP:0002544),
8   MUC6_AR MUC6    4588    AR  0.0096  0.7531  0.5030  0.9990  0.0000  0.0000  0.0000  0.5030  0.5030  1.0000  0                   Proximity to GKN2 Brachydactyly (HP:0001156)-brachydactyly (MP:0002544), Broad thumb (HP:0011304)-brachydactyly (MP:0002544), Broad hallux (HP:0010055)-brachydactyly (MP:0002544),

TSV VARIANTS

In the variants.tsv file it is possible for a variant, like a gene, to appear multiple times, depending on the MOI it is compatible with. For example in the example below MUC6 has two variants ranked 7th under the AD model and two ranked 8th under an AR (compound heterozygous) model. In the AD case the CONTRIBUTING_VARIANT column indicates whether the variant was (1) or wasn't (0) used for calculating the EXOMISER_GENE_COMBINED_SCORE and EXOMISER_GENE_VARIANT_SCORE.

#RANK   ID  GENE_SYMBOL ENTREZ_GENE_ID  MOI P-VALUE EXOMISER_GENE_COMBINED_SCORE    EXOMISER_GENE_PHENO_SCORE   EXOMISER_GENE_VARIANT_SCORE EXOMISER_VARIANT_SCORE  CONTRIBUTING_VARIANT    WHITELIST_VARIANT   VCF_ID  RS_ID   CONTIG  START   END REF ALT CHANGE_LENGTH   QUAL    FILTER  GENOTYPE    FUNCTIONAL_CLASS    HGVS    EXOMISER_ACMG_CLASSIFICATION    EXOMISER_ACMG_EVIDENCE  EXOMISER_ACMG_DISEASE_ID    EXOMISER_ACMG_DISEASE_NAME  CLINVAR_ALLELE_ID   CLINVAR_PRIMARY_INTERPRETATION  CLINVAR_STAR_RATING GENE_CONSTRAINT_LOEUF   GENE_CONSTRAINT_LOEUF_LOWER GENE_CONSTRAINT_LOEUF_UPPER MAX_FREQ_SOURCE MAX_FREQ    ALL_FREQ    MAX_PATH_SOURCE MAX_PATH    ALL_PATH
1   10-123256215-T-G_AD FGFR2   2263    AD  0.0000  0.9981  1.0000  1.0000  1.0000  1   1       rs121918506 10  123256215   123256215   T   G   0   100.0000    PASS    1|0 missense_variant    FGFR2:ENST00000346997.2:c.1688A>C:p.(Glu563Ala) LIKELY_PATHOGENIC   PM2,PP3_Strong,PP4,PP5  OMIM:123150 Jackson-Weiss syndrome  28333   LIKELY_PATHOGENIC   1   0.13692 0.074   0.27                REVEL   0.965   REVEL=0.965,MVP=0.9517972
2   6-132203615-G-A_AD  ENPP1   5167    AD  0.0049  0.8690  0.5773  0.9996  0.9996  1   0       rs770775549 6   132203615   132203615   G   A   0   922.9800    PASS    0/1 splice_donor_variant    ENPP1:ENST00000360971.2:c.2230+1G>A:p.? UNCERTAIN_SIGNIFICANCE  PVS1_Strong OMIM:615522 Cole disease        NOT_PROVIDED    0   0.41042 0.292   0.586   GNOMAD_E_SAS    0.0032486517    TOPMED=7.556E-4,EXAC_NON_FINNISH_EUROPEAN=0.0014985314,GNOMAD_E_NFE=0.0017907989,GNOMAD_E_SAS=0.0032486517
//
7   11-1018088-TG-T_AD  MUC6    4588    AD  0.0096  0.7532  0.5030  0.9990  0.9990  1   0       rs765231061 11  1018088 1018089 TG  T   -1  441.8100    PASS    0/1 frameshift_variant  MUC6:ENST00000421673.2:c.4712del:p.(Pro1571Hisfs*21)    UNCERTAIN_SIGNIFICANCE                  NOT_PROVIDED    0   0.79622 0.656   0.971   GNOMAD_G_NFE    0.0070363074    GNOMAD_E_AMR=0.0030803352,GNOMAD_G_NFE=0.0070363074
7   11-1018093-G-GT_AD  MUC6    4588    AD  0.0096  0.7532  0.5030  0.9990  0.9989  0   0       rs376177791 11  1018093 1018093 G   GT  1   592.4500    PASS    0/1 frameshift_elongation   MUC6:ENST00000421673.2:c.4707dup:p.(Pro1570Thrfs*136)   NOT_AVAILABLE                   NOT_PROVIDED    0   0.79622 0.656   0.971   GNOMAD_G_NFE    0.007835763 GNOMAD_G_NFE=0.007835763
8   11-1018088-TG-T_AR  MUC6    4588    AR  0.0096  0.7531  0.5030  0.9990  0.9990  1   0       rs765231061 11  1018088 1018089 TG  T   -1  441.8100    PASS    0/1 frameshift_variant  MUC6:ENST00000421673.2:c.4712del:p.(Pro1571Hisfs*21)    UNCERTAIN_SIGNIFICANCE                  NOT_PROVIDED    0   0.79622 0.656   0.971   GNOMAD_G_NFE    0.0070363074    GNOMAD_E_AMR=0.0030803352,GNOMAD_G_NFE=0.0070363074
8   11-1018093-G-GT_AR  MUC6    4588    AR  0.0096  0.7531  0.5030  0.9990  0.9989  1   0       rs376177791 11  1018093 1018093 G   GT  1   592.4500    PASS    0/1 frameshift_elongation   MUC6:ENST00000421673.2:c.4707dup:p.(Pro1570Thrfs*136)   UNCERTAIN_SIGNIFICANCE                  NOT_PROVIDED    0   0.79622 0.656   0.971   GNOMAD_G_NFE    0.007835763 GNOMAD_G_NFE=0.007835763

VCF

In the VCF file it is possible for a variant, like a gene, to appear multiple times, depending on the MOI it is compatible with. For example in the example below MUC6 has two variants ranked 7th under the AD model and two ranked 8th under an AR (compound heterozygous) model. In the AD case the CONTRIBUTING_VARIANT column indicates whether the variant was (1) or wasn't (0) used for calculating the EXOMISER_GENE_COMBINED_SCORE and EXOMISER_GENE_VARIANT_SCORE. The INFO field with the ID=Exomiser describes the internal format of this sub-field. Be aware that for multi-allelic sites, Exomiser will decompose and trim them for the proband sample and this is what will be displayed in the Exomiser ID sub-field e.g. 11-1018088-TG-T_AD.

##INFO=<ID=Exomiser,Number=.,Type=String,Description="A pipe-separated set of values for the proband allele(s) from the record with one per compatible MOI following the format: {RANK|ID|GENE_SYMBOL|ENTREZ_GENE_ID|MOI|P-VALUE|EXOMISER_GENE_COMBINED_SCORE|EXOMISER_GENE_PHENO_SCORE|EXOMISER_GENE_VARIANT_SCORE|EXOMISER_VARIANT_SCORE|CONTRIBUTING_VARIANT|WHITELIST_VARIANT|FUNCTIONAL_CLASS|HGVS|EXOMISER_ACMG_CLASSIFICATION|EXOMISER_ACMG_EVIDENCE|EXOMISER_ACMG_DISEASE_ID|EXOMISER_ACMG_DISEASE_NAME}">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    sample
10  123256215   .   T   G   100 PASS    Exomiser={1|10-123256215-T-G_AD|FGFR2|2263|AD|0.0000|0.9981|1.0000|1.0000|1.0000|1|1|missense_variant|FGFR2:ENST00000346997.2:c.1688A>C:p.(Glu563Ala)|LIKELY_PATHOGENIC|PM2,PP3_Strong,PP4,PP5|OMIM:123150|"Jackson-Weiss syndrome"};GENE=FGFR2;INHERITANCE=AD;MIM=101600   GT:DS:PL    1|0:2.000:50,11,0
11  1018088 .   TG  T   441.81  PASS    AC=1;AF=0.50;AN=2;BaseQRankSum=7.677;DP=162;DS;Exomiser={7|11-1018088-TG-T_AD|MUC6|4588|AD|0.0096|0.7532|0.5030|0.9990|0.9990|1|0|frameshift_variant|MUC6:ENST00000421673.2:c.4712del:p.(Pro1571Hisfs*21)|UNCERTAIN_SIGNIFICANCE|||""},{8|11-1018088-TG-T_AR|MUC6|4588|AR|0.0096|0.7531|0.5030|0.9990|0.9990|1|0|frameshift_variant|MUC6:ENST00000421673.2:c.4712del:p.(Pro1571Hisfs*21)|UNCERTAIN_SIGNIFICANCE|||""};FS=25.935;HRun=3;HaplotypeScore=1327.2952;MQ=43.58;MQ0=6;MQRankSum=-5.112;QD=2.31;ReadPosRankSum=2.472;set=variant    GT:AD:DP:GQ:PL  0/1:146,45:162:99:481,0,5488
11  1018093 .   G   GT  592.45  PASS    AC=1;AF=0.50;AN=2;BaseQRankSum=8.019;DP=157;Exomiser={7|11-1018093-G-GT_AD|MUC6|4588|AD|0.0096|0.7532|0.5030|0.9990|0.9989|0|0|frameshift_elongation|MUC6:ENST00000421673.2:c.4707dup:p.(Pro1570Thrfs*136)|NOT_AVAILABLE|||""},{8|11-1018093-G-GT_AR|MUC6|4588|AR|0.0096|0.7531|0.5030|0.9990|0.9989|1|0|frameshift_elongation|MUC6:ENST00000421673.2:c.4707dup:p.(Pro1570Thrfs*136)|UNCERTAIN_SIGNIFICANCE|||""};FS=28.574;HRun=1;HaplotypeScore=1267.6968;MQ=44.06;MQ0=4;MQRankSum=-5.166;QD=3.26;ReadPosRankSum=1.328;set=variant    GT:AD:DP:GQ:PL  0/1:140,42:157:99:631,0,4411
6   132203615   .   G   A   922.98  PASS    AC=1;AF=0.50;AN=2;BaseQRankSum=-0.671;DP=94;Dels=0.00;Exomiser={2|6-132203615-G-A_AD|ENPP1|5167|AD|0.0049|0.8690|0.5773|0.9996|0.9996|1|0|splice_donor_variant|ENPP1:ENST00000360971.2:c.2230+1G>A:p.?|UNCERTAIN_SIGNIFICANCE|PVS1_Strong|OMIM:615522|"Cole disease"};FS=0.805;HRun=0;HaplotypeScore=3.5646;MQ=56.63;MQ0=0;MQRankSum=1.807;QD=9.82;ReadPosRankSum=-0.900;set=variant2   GT:AD:DP:GQ:PL  0/1:53,41:94:99:953,0,1075

The VCF file is tabix-indexed and exomiser ranked alleles can be extracted using grep. For example, to display the top 5 ranked variants, you would issue the command:

zgrep -E '\{[1-5]{1}\|' Pfeiffer-hiphive-exome-PASS_ONLY.vcf.gz