-
Notifications
You must be signed in to change notification settings - Fork 0
Gene Data Model
Stuart Watt edited this page Nov 2, 2015
·
2 revisions
The gene data model is fairly complex, and hierarchical, as is the nature with a document-oriented database. Glossing over some of the informal nature, and unpacking it into a pseudo-relational structure, it goes like this:
-
name
- readable name version
-
id
- the Ensembl gene ID -
transcripts
- collection of GeneTranscript records -
description
- a GeneDescription record
-
gene
- e.g., "ENSG00000186092" -
transcript
- e.g., "ENST00000335137 -
position
- e.g., 69345 -
genomicPositionStart
- e.g., 69345 -
referenceAllele
- e.g., "C" -
codonStart
- e.g., 85 -
codonStop
- e.g., 85 -
cdsPositionStart
- e.g., 255 -
cdsPositionStop
- e.g., 255 -
strand
- e.g., 1 -
HGVSc
- e.g., "c.255C>A" -
HGVSpr
- e.g., "p.I85I", -
exon
- e.g., 1 -
variantAllele
- e.g., "A" -
HGVSp
- e.g., "p.Ile85Ile" -
geneSymbol
- e.g., "OR4F5" -
chromosome
- e.g., "1"
-
id
- e.g., "ENST00000425967" -
versionedId
- e.g., "ENST00000425967.3" -
name
- e.g., "FGFR1-202" -
translationId
- e.g., "ENSP00000393312" -
versionedTranslationId
- e.g., "ENSP00000393312.3" lengthAminoAcid
-
lengthDNA
- e.g., 2562 -
numberOfExons
- e.g., 18 -
seqExonStart
- e.g., 150 -
seqExonStart
- e.g., 33 -
length
- e.g., 5375 -
seqExonEnd
- e.g, 154 -
endExon
- e.g., 18 -
isCanonical
- e.g., true -
refSeqId
- collection of strings, e.g., ["NM_001174067.1"] -
exons
- collection of GeneTranscriptExon records -
domains
- collection of GeneTranscriptDomain records
-
gffSource
- e.g., "Pfam" -
hitName
- e.g., "PF07679" -
score
- e.g., 47.7 -
evalue
- e.g., 4.1e-16 -
perc_ident
- e.g., 0 -
start
- e.g., 293 -
end
- e.g., 389 -
interproId
- e.g., "IPR013098" -
description
- e.g. "Immunoglobulin I-set"
-
start
- e.g., 38315052 -
end
- e.g., 38314874 -
startPhase
- one of -1, 0, or 1; e.g., -1 -
endPhase
- e.g., 1
-
fullName
- e.g., "fibroblast growth factor receptor 1" -
synonyms
- collection of strings, e.g., ["FGFR1", "CD331" ...] -
summary
- long text from RefSeq