# 1 Quick Start
This notebook provides instructions on how to get your vrs-python environment up and running with as few
steps as possible, and to provide some rudimentary examples to prove it is working properly.

This assumes that you have successfully performed the pre-requisite installations and setup steps detailed in [README.md](README.md).

The vrs-python package has a dependency on the [biocommons seqrepo package](https://github.com/biocommons/biocommons.seqrepo). SeqRepo is used for referencing biological sequences. In this series of notebooks we will be using the publicly available SeqRepo when initializing *DataProxy* as seen below in Step 1:

    seqrepo_rest_service_url = "seqrepo+https://services.genomicmedlab.org/seqrepo"

Another dependency of vrs-python is the [biocommons hgvs package](https://github.com/biocommons/hgvs) for parsing HGVS nomenclature. The hgvs package further relies on the [biocommons uta package](https://github.com/biocommons/uta). The Universal Transcript Archive (UTA) is a Postgres database that stores transcripts aligned to sequence references. When necessary, we will define `UTA_DB_URL` in the environment using the public access version as follows:

    UTA_DB_URL="postgresql://anonymous:anonymous@uta.biocommons.org:5432/uta/uta_20210129b"

**NOTE** The external sources for the SeqRepo and UTA repositories are **ONLY** to be used as part of this notebook series and are not meant for use in production code. Please refer to the links above and follow the directions provided on how to setup local instances.

#### Step 1 - Setup Data Proxy Access
The *DataProxy* provides access to sequence references.

In [1]:
from ga4gh.vrs.dataproxy import create_dataproxy
seqrepo_rest_service_url = "seqrepo+https://services.genomicmedlab.org/seqrepo"
seqrepo_dataproxy = create_dataproxy(uri=seqrepo_rest_service_url)

Assert that the UTA URL is defined in the environment

In [2]:
import os
os.environ["UTA_DB_URL"] = "postgresql://anonymous:anonymous@uta.biocommons.org:5432/uta/uta_20210129b"

#### Step 2 - Setup an Allele Translator
Now we will create and utilize an *AlleleTranslator* to test that our setup is good. The *AlleleTranslator* has the ability to translate single nucleotide variants and simple insertions/deletions forms to VRS. Notice the *AlleleTranslator* dependency on the *DataProxy*.

In [3]:
from ga4gh.vrs.extras.translator import AlleleTranslator
translator = AlleleTranslator(data_proxy=seqrepo_dataproxy)

#### Step 3 - Translate variation representations to VRS
Now we are ready to have the *AlleleTranslator* transform our first SPDI variant representation to VRS.
This variant can be viewed in [Clinvar](https://www.ncbi.nlm.nih.gov/clinvar/variation/2673535/?oq=2673535)

In [4]:
vrs_from_spdi = translator.translate_from("NC_000005.10:80656509:C:TT", "spdi")
vrs_from_spdi.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'type': 'Allele',
 'digest': 'LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'location': {'id': 'ga4gh:SL.nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'type': 'SequenceLocation',
  'digest': 'nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'},
  'start': 80656509,
  'end': 80656510},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'TT'}}

The output above is the JSON structure of an *Allele* in VRS form. You should be able to recognize the *Allele*, *SequenceLocation*, *SequenceReference* and *LiteralSequenceLocation* classes. 
 
Now we will pass the HGVS variant representation of the same variant to the *AlleleTranslator*.

In [5]:
vrs_from_hgvs = translator.translate_from("NC_000005.10:g.80656510delinsTT", "hgvs")
vrs_from_hgvs.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'type': 'Allele',
 'digest': 'LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'location': {'id': 'ga4gh:SL.nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'type': 'SequenceLocation',
  'digest': 'nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'},
  'start': 80656509,
  'end': 80656510},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'TT'}}

The VRS variant representations should be the same.

In [6]:
assert(vrs_from_hgvs == vrs_from_spdi)