# 3 Basic Models
This notebook details how to compose VRS objects using component classes, not by use of a nomenclature string (HGVS/SPDI/Gnomad-VCF).


#### Step 1 - Setup Data Proxy Access
The *DataProxy* provides access to sequence references.

In [1]:
from ga4gh.vrs.dataproxy import create_dataproxy
seqrepo_rest_service_url = "seqrepo+https://services.genomicmedlab.org/seqrepo"
seqrepo_dataproxy = create_dataproxy(uri=seqrepo_rest_service_url)

#### Step 2 - Access the VRS models package
The models package contains the various classes necessary for building VRS objects.

In [2]:
from ga4gh.vrs import models

#### Step 3 - Build the Allele
In this example we are going to build a VRS object from the variant "NC_000005.10:g.80656510delinsTT". This variant can be viewed in [Clinvar](https://www.ncbi.nlm.nih.gov/clinvar/variation/2673535/).

Start by getting the VRS string representation of the sequence reference using the *DataProxy* object.

In [3]:
refget_accession = seqrepo_dataproxy.derive_refget_accession('refseq:NM_002439.5')
print(refget_accession)

SQ.Pw3Ch0x3XWD6ljsnIfmk_NERcZCI9sNM


Build a dictionary of type *SequenceReference* containing the refget_accession. Then continue in succession building dictionaries of type *SequenceLocation*, *LiteralSequenceExpression* and *Allele* referencing previously built structures where applicable.

In [4]:
sequence_reference_dict = {
    "type": "SequenceReference",
    "refgetAccession": refget_accession
}

sequence_location_dict = {
    "type": "SequenceLocation",
    "sequenceReference": sequence_reference_dict,
    "start": 80656509,
    "end": 80656510
}

literal_sequence_expression_dict = {
    "type": "LiteralSequenceExpression",
    "sequence": "TT"
}

allele_dict = {
    "type": "Allele",
    "location": sequence_location_dict,
    "state": literal_sequence_expression_dict
}
allele = models.Allele(**allele_dict)
allele.model_dump(exclude_none=True)

{'type': 'Allele',
 'location': {'type': 'SequenceLocation',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.Pw3Ch0x3XWD6ljsnIfmk_NERcZCI9sNM'},
  'start': 80656509,
  'end': 80656510},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'TT'}}

The *Allele* object is displayed above. Since it was built from component dictionaries, it is not yet complete as not all the identifiable objects have VRS identifiers. Note that not all objects in the Allele object are VRS identifiable.

In [5]:
def is_identifiable(obj):
    print(obj.__class__.__name__, "identifiable?", obj.is_ga4gh_identifiable())
sequence_reference = models.SequenceReference(**sequence_reference_dict)
is_identifiable(sequence_reference)
sequence_location = models.SequenceLocation(**sequence_location_dict)
is_identifiable(sequence_location)
literal_sequence_expression = models.LiteralSequenceExpression(**literal_sequence_expression_dict)
is_identifiable(literal_sequence_expression)
is_identifiable(allele)

SequenceReference identifiable? False
SequenceLocation identifiable? True
LiteralSequenceExpression identifiable? False
Allele identifiable? True


#### Step 4 - Compute the identifiers
To make the *Allele* object a valid VRS object - that is that all identifiable objects have valid VRS identifiers - is to use the *ga4gh_identify* method on the identifiable objects (*SequenceLocation*, and *Allele*).

In [6]:
from ga4gh.core import ga4gh_identify
allele.location.id = ga4gh_identify(allele.location)
allele.id = ga4gh_identify(allele)
allele.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.5C67OBmCLuHPgDkCQj7EOMih58BS2Eor',
 'type': 'Allele',
 'digest': '5C67OBmCLuHPgDkCQj7EOMih58BS2Eor',
 'location': {'id': 'ga4gh:SL.lGxOP1JRd4dysmrOVaskO5P_35DyCLnx',
  'type': 'SequenceLocation',
  'digest': 'lGxOP1JRd4dysmrOVaskO5P_35DyCLnx',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.Pw3Ch0x3XWD6ljsnIfmk_NERcZCI9sNM'},
  'start': 80656509,
  'end': 80656510},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'TT'}}

The output of the *Allele* object represents a complete VRS allele with VRS identifiers and digests on all of the identifiable objects.