Skip to content

Detailed data model

Stuart Watt edited this page Nov 2, 2015 · 7 revisions

Principles

Heliotrope uses a document-oriented database, MongoDB, because of its schema-less nature. This allows it to incorporate future data without affecting existing data structures. There are a number of primary collections:

  • genes - data associated with genes
  • variants - data associated with variants
  • annotations - both user- and system-supplied annotations, keyed to either a variant or a gene
  • tags - classifications for variant types, used for annotations
  • statistics - precomputed statistical results slow to calculate dynamically, e.g., the primary gene frequencies
  • variantRecords - sample-level variant occurrences, used to generate frequency information (mainly from COSMIC)

Genes

See the [gene data model page](Gene Data Model)

Variants

Positions

This is stored in sections.positions.data, and is a list of objects with the following fields:

  • start - integer - the start position on the chromosome
  • stop - integer - the stop position on the chromosome
  • chromosome - string - the chromosome name, i.e., 1-22, X, Y, ...
  • strand - integer - the strand, either +1 or -1
  • cdsPosition - integer - the start of the variant as a coding position
  • codon - integer - the start of the variant as a codon number
  • exon - integer - the exon number
  • HGVSc - string - the variant in HGVS nomenclature, DNA system
  • HGVSp - string - the variant in HGVS nomenclature, amino-acid system
  • HGVSpr - string - the variant in HGVS nomenclature, amino-acid system, using letters instead of codes
  • referenceAllele - string - the reference allele for a mutation
  • variantAllele - string - the variant allele for a mutation
  • consequence - string - the consequence type, e.g., synonymous_variant
  • gene - Ensembl identifier for the gene
  • transcript - Ensembl identifier for the transcript for this position
  • sift - object - an object with two keys, a level and a score, for SIFT prediction
  • polyphen - object - an object with two keys, a level and a score, for Polyphen prediction
  • significance - string - from dbSnp, e.g., pathogenic

Notes:

  1. Only positions for the canonical transcript are stored.
Clone this wiki locally