-
Notifications
You must be signed in to change notification settings - Fork 0
Detailed data model
Stuart Watt edited this page Nov 2, 2015
·
7 revisions
Heliotrope uses a document-oriented database, MongoDB, because of its schema-less nature. This allows it to incorporate future data without affecting existing data structures. There are a number of primary collections:
- genes - data associated with genes
- variants - data associated with variants
- annotations - both user- and system-supplied annotations, keyed to either a variant or a gene
- tags - classifications for variant types, used for annotations
- statistics - precomputed statistical results slow to calculate dynamically, e.g., the primary gene frequencies
- variantRecords - sample-level variant occurrences, used to generate frequency information (mainly from COSMIC)
See the [gene data model page](Gene Data Model)
This is stored in sections.positions.data
, and is a list of objects with the following fields:
- start - integer - the start position on the chromosome
- stop - integer - the stop position on the chromosome
- chromosome - string - the chromosome name, i.e., 1-22, X, Y, ...
- strand - integer - the strand, either +1 or -1
- cdsPosition - integer - the start of the variant as a coding position
- codon - integer - the start of the variant as a codon number
- exon - integer - the exon number
- HGVSc - string - the variant in HGVS nomenclature, DNA system
- HGVSp - string - the variant in HGVS nomenclature, amino-acid system
- HGVSpr - string - the variant in HGVS nomenclature, amino-acid system, using letters instead of codes
- referenceAllele - string - the reference allele for a mutation
- variantAllele - string - the variant allele for a mutation
-
consequence - string - the consequence type, e.g.,
synonymous_variant
- gene - Ensembl identifier for the gene
- transcript - Ensembl identifier for the transcript for this position
- sift - object - an object with two keys, a level and a score, for SIFT prediction
- polyphen - object - an object with two keys, a level and a score, for Polyphen prediction
-
significance - string - from dbSnp, e.g.,
pathogenic
Notes:
- Only positions for the canonical transcript are stored.