Skip to content
Stuart Watt edited this page Nov 2, 2015 · 2 revisions

The gene data model is fairly complex, and hierarchical, as is the nature with a document-oriented database. Glossing over some of the informal nature, and unpacking it into a pseudo-relational structure, it goes like this:

Gene

  • name - readable name
  • version
  • id - the Ensembl gene ID
  • transcripts - collection of GeneTranscript records
  • description - a GeneDescription record

Variants

  • gene - e.g., "ENSG00000186092"
  • transcript - e.g., "ENST00000335137
  • position - e.g., 69345
  • genomicPositionStart - e.g., 69345
  • referenceAllele - e.g., "C"
  • codonStart - e.g., 85
  • codonStop - e.g., 85
  • cdsPositionStart - e.g., 255
  • cdsPositionStop - e.g., 255
  • strand - e.g., 1
  • HGVSc - e.g., "c.255C>A"
  • HGVSpr - e.g., "p.I85I",
  • exon - e.g., 1
  • variantAllele - e.g., "A"
  • HGVSp - e.g., "p.Ile85Ile"
  • geneSymbol - e.g., "OR4F5"
  • chromosome - e.g., "1"

GeneTranscript

  • id - e.g., "ENST00000425967"
  • versionedId - e.g., "ENST00000425967.3"
  • name - e.g., "FGFR1-202"
  • translationId - e.g., "ENSP00000393312"
  • versionedTranslationId - e.g., "ENSP00000393312.3"
  • lengthAminoAcid
  • lengthDNA - e.g., 2562
  • numberOfExons - e.g., 18
  • seqExonStart - e.g., 150
  • seqExonStart - e.g., 33
  • length - e.g., 5375
  • seqExonEnd - e.g, 154
  • endExon - e.g., 18
  • isCanonical - e.g., true
  • refSeqId - collection of strings, e.g., ["NM_001174067.1"]
  • exons - collection of GeneTranscriptExon records
  • domains - collection of GeneTranscriptDomain records

GeneTranscriptDomain

  • gffSource - e.g., "Pfam"
  • hitName - e.g., "PF07679"
  • score - e.g., 47.7
  • evalue - e.g., 4.1e-16
  • perc_ident - e.g., 0
  • start - e.g., 293
  • end - e.g., 389
  • interproId - e.g., "IPR013098"
  • description - e.g. "Immunoglobulin I-set"

GeneTranscriptExon

  • start - e.g., 38315052
  • end - e.g., 38314874
  • startPhase - one of -1, 0, or 1; e.g., -1
  • endPhase - e.g., 1

GeneDescription

  • fullName - e.g., "fibroblast growth factor receptor 1"
  • synonyms - collection of strings, e.g., ["FGFR1", "CD331" ...]
  • summary - long text from RefSeq