Skip to content

Diagnostic Workflow

Jaimee Beckett edited this page Oct 25, 2022 · 3 revisions

ATAV Diagnostic Analysis Workflow

This diagnostic analysis workflow has been developed to filter for and prioritize pathogenic variation/mutation in patient samples.

Relevant reading material:

  • Petrovski, Slavé, et al. "Whole-exome sequencing in the evaluation of fetal structural anomalies: a prospective cohort study." The Lancet (2019).
  • Zhu X, et al. "Whole exome sequencing in undiagnosed genetic diseases: Interpreting 119 trios." Genetics in Medicine (2015).

Requirements

Initial setup

conda activate atav-cli
export ATAV_HOME=$(pwd)/atav/

# If you're not running on database server, update the database connection settings.
# For example, replace 127.0.0.1 with your server address (should look like
# ec2-1-1-1-1.compute-1.amazonaws.com) in this file:
# vi $ATAV_HOME/config/atav.dragen.system.config.properties

# Create an output directory; rename to your specific project.
mkdir atav_output
export PROJECT=atav_output

Variant filtration:

img

ATAV Commands:

Trio (or Duo) analysis

Rare variants:

java -jar $ATAV_HOME/atav_trunk.jar --list-trio --impact HIGH,MODERATE,LOW --min-ad-alt 3 --qual 30 --gq 20 --filter pass,likely,intermediate --max-default-control-af 0.01 --max-gnomad-exome-af 0.01 --max-gnomad-genome-af 0.01 --include-qc-missing --sample $SAMPLE --out $OUTPUT

Modifier known variants:

java -jar $ATAV_HOME/atav_trunk.jar --list-trio --known-var-pathogenic-only --modifier-only --min-ad-alt 3 --qual 30 --gq 20 --filter pass,likely,intermediate --include-qc-missing --sample $SAMPLE --out $OUTPUT

Singleton analysis

Rare variants:

java -jar $ATAV_HOME/atav_trunk.jar --list-singleton --impact HIGH,MODERATE,LOW --min-ad-alt 3 --qual 30 --gq 20 --filter pass,likely,intermediate --max-default-control-af 0.01 --max-gnomad-exome-af 0.01 --max-gnomad-genome-af 0.01 --include-qc-missing --sample $SAMPLE --out $OUTPUT

Modifier known variants:

java -jar $ATAV_HOME/atav_trunk.jar --list-singleton --known-var-pathogenic-only --modifier-only --min-ad-alt 3 --qual 30 --gq 20 --filter pass,likely,intermediate --include-qc-missing --sample $SAMPLE --out $OUTPUT

Coverage Summary (for trio & singleton): 

java -jar $ATAV_HOME/atav_trunk.jar --coverage-summary --gene-boundaries /nfs/goldstein/software/atav_home/data/ccds/addjusted.CCDS.genes.index.r20.hg19.txt --min-coverage 10 --percent-region-covered .9 --sample $SAMPLE --out $OUTPUT

Note: all the external dataset are used in variant prioritization / tier classification will be included by default.

Variant prioritization:

img

Variant reviewing SOP

Trio (or Duo) analysis

  • rare_variant_trio_genotypes.csv: rare variant with Tier1 or Tier2 or LoF Gene or Known Variant
  • rare_variant_trio_genotypes_noflag.csv: rare variants do not meet any of the flags above
  • modifier_known_variant_trio_genotypes.csv: all non-coding known variants

Singleton analysis output

  • rare_variant_singleton_genotypes.csv: rare variant with Tier1 or Tier2 or LoF Gene or Known Variant
  • rare_variant_singleton_genotypes_noflag.csv: rare variants do not meet any of the flags above
  • modifier_known_variant_singleton_genotypes.csv: all non-coding known variants

How to document Notes, Clinical Fit and Curation Results

  1. Save CSV file as Excel file
  2. Create two new columns at the very left of the output. Name the first column “Notes”, and the second column “Clinical Fit”
  3. Write your notes on Note column
  4. Document clinical fit in clinical fit column
  5. Highlight variants you want to present in Green, and rows you need to do more research on in yellow
  6. Highlight variants that you have analyzed and have ruled out in gray, as they may come up more than once using the filters below

SOP for Trio/Duo Analysis of Rare Coding Variants (rare_variant_trio_genotypes.csv)

  • Apply (Excel) filter to column Single Variant Prioritization to review all single variants:
    • 01_TIER1_DNM_HZ: Tier 1 De novo variants in "Hot Zone"
    • 02_TIER1_DNM: Tier 1 De novo non-HZ variants
    • 03_TIER1_HOMO_HEMI: Tier 1 Homo/Hemi variants
    • 04_TIER2_DNM: Tier 2 De novo variants
    • 05_TIER2_HOMO_HEMI: Tier 2 Homo/Hemi variants
    • 06_LOF_GENE: LoF Dominant and Haploinsufficient Gene
    • 07_KNOWN_VAR: Previously reported ClinVar P/LP variants or HGMD DM (as long as it is NOT ClinVar B/LB)
    • 08_CLINVAR_SITE: >=1 ClinVar P/LP variant at site. Includes precise indel overlaps.
    • 09_CLINVAR_2BP: >= 1 ClinVar P/LP variant within 2bp flanking. Includes precise indel overlaps.
    • 10_HGMD_SITE: >=1 HGMD DM variant at site. Includes precise indel overlaps.
    • 11_MIS_HOT_SPOT: Missense Dominant and Haploinsufficient Gene + ClinVar PLP 25bpflanks count >= 6
    • 12_TIER1_OMIM_MIS_INFRAME: High-quality ultra rare novel missense/inframe indel variants
    • 13_ACMG_GENE: ACMG v3 gene associated variants
  • Apply (Excel) filter to column Compound Het Variant Prioritization to review all compound het variants:
    • 01_TIER1
    • 02_TIER2

SOP for Trio Analysis of Rare Coding Variants that meet no Flags (rare_variant_trio_genotypes_noflag.csv)

These are variants that DO NOT meet any of the flags (Tier 1 or 2 or LoF Gene or KV). This output will be helpful for the following: you have a specific gene/gene list in mind, or if you want to look for a second variant, or if the patient has a more common phenotype or if looking for inherited variants from the parents. Note that duos are also included in this Output. In this output review all the variants/genes for clinical overlap.

SOP for Trio Analysis of Non-Coding Known Variants (modifier_known_variant_trio_genotypes.csv)

All variants meet the Known Pathogenic Variant Flag (previously reported P/LP variants on CV or DM (only if it is NOT CV B/LB)). Note that duos are also included in this Output.

  1. Apply (Excel) filter to column “Tier Flag (Single Var)” for Tier 1 or 2 variants
  2. Remove Tier filtering then go to column “ACMG Disease” and uncheck NA, review those variants
  3. Remove ACMG filter, go to “Gene Link” filter for genes that have OMIM genes
  4. Quickly review phenotypes to see if it is a phenotype fit

Singleton Analysis of Rare Coding Variants (rare_variant_singleton_genotypes.csv)

  • Apply (Excel) filter to column Single Variant Prioritization to review all single variants:
    • 01_LOF_GENE: LoF Dominant and Haploinsufficient Gene
    • 02_TIER1_HOMO_HEMI: Tier 1 Homo/Hemi variants
    • 03_KNOWN_VAR: Previously reported ClinVar P/LP variants or HGMD DM (as long as it is NOT ClinVar B/LB)
    • 04_CLINVAR_SITE: >=1 ClinVar P/LP variant at site. Includes precise indel overlaps.
    • 05_CLINVAR_2BP: >= 1 ClinVar P/LP variant within 2bp flanking. Includes precise indel overlaps.
    • 06_HGMD_SITE: >=1 HGMD DM variant at site. Includes precise indel overlaps.
    • 07_MIS_HOT_SPOT: Missense Dominant and Haploinsufficient Gene + ClinVar PLP 25bpflanks count >= 6
    • 08_TIER1_OMIM_MIS_INFRAME: High-quality ultra rare novel missense/inframe indel variants
    • 09_ACMG_GENE: ACMG v3 gene associated variants
  • Apply (Excel) filter to column Compound Het Variant Prioritization to review all compound het variants:
    • 01_TIER1
    • 02_TIER2

SOP for Singleton Analysis of Rare Coding Variants that meet no Flags (rare_variant_singleton_genotypes_noflag.csv)

These are variants that DO NOT meet any of the flags (Tier 1 or 2 or LoF Gene or KV). This list will have inherited variants, variants in specific gene/gene list in mind, or if you want to look for a second variant, or if the patient has a more common phenotype.. In this output review all the variants/genes for clinical overlap.

SOP for Singleton Analysis of Non-Coding Known Variants (modifier_known_variant_singleton_genotypes.csv)

All variants meet the Known Pathogenic Variant Flag (previously reported P/LP variants on CV or DM (only if it is NOT CV B/LB)).

  1. Apply (Excel) filter to column “Tier Flag (Single Var)” for Tier 1 or 2 variants
  2. Remove Tier filtering then go to column “ACMG Disease” and uncheck NA, review those variants
  3. Remove ACMG filter, go to “Gene Link” filter for genes that have OMIM genes
  4. Quickly review phenotypes to see if it is a phenotype fit

We suggested to mainly focus on reviewing rare coding variants output (rare_variant_trio_genotypes.csv or rare_variant_singleton_genotypes.csv).

Prepare sample files

Columns:
1. Family ID: specify a family ID or use the same value as Individual ID to indicate this sample 
is being used as a non family sample
2. Individual ID: sample name
3. Paternal ID: either the father's sample ID, or 0 if run as singleton.
4. Maternal ID: either the mother's sample ID, or 0 if run as singleton.
5. Sex: 1 for male, 2 for female.
6. Phenotype: 1 for control, 2 for case.
7. Sample Type: Exome or genome.
8. Capture Kit: The sequencing capture kit.