# 6. New and Upcoming Features in VRS 2.x
The VRS 2.0 specification is under active development, and several new and upcoming features have been added to VRS-Python in preparation for this upcoming release. This notebook covers several of these upcoming features.

## Prerequisites - Setup Data Proxy Access
The *DataProxy* provides access to sequence references.

In [1]:
from ga4gh.vrs.dataproxy import create_dataproxy
seqrepo_rest_service_url = "seqrepo+https://services.genomicmedlab.org/seqrepo"
seqrepo_dataproxy = create_dataproxy(uri=seqrepo_rest_service_url)

Import the *AlleleTranslator* class.

In [2]:
from ga4gh.vrs.extras.translator import AlleleTranslator
translator = AlleleTranslator(data_proxy=seqrepo_dataproxy)

The UTA server is required in the environment since we are translating from/to HGVS.

In [3]:
import os
os.environ["UTA_DB_URL"] = "postgresql://anonymous:anonymous@uta.biocommons.org:5432/uta/uta_20210129b"

## VRS Object Metadata

First, we start with an Allele from our previous examples.

In [4]:
allele = translator.translate_from("NC_000005.10:80656509:C:TT", "spdi")
allele.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'type': 'Allele',
 'digest': 'LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'location': {'id': 'ga4gh:SL.nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'type': 'SequenceLocation',
  'digest': 'nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'},
  'start': 80656509,
  'end': 80656510},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'TT'}}

This Allele, like all variant and location objects in VRS, has several useful fields for describing object metadata.

### Describing a Sequence

The location of our Allele is a VRS `SequenceLocation` object.

In [5]:
seqloc = allele.location
seqloc.model_dump(exclude_none=True)

{'id': 'ga4gh:SL.nA5-KovovkH-5p3LF1657nkkeWFwrInI',
 'type': 'SequenceLocation',
 'digest': 'nA5-KovovkH-5p3LF1657nkkeWFwrInI',
 'sequenceReference': {'type': 'SequenceReference',
  'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'},
 'start': 80656509,
 'end': 80656510}

The `SequenceLocation` uses a `SequenceReference` object to describe the sequence on which the location is defined:

In [6]:
seqref = seqloc.sequenceReference
seqref.model_dump(exclude_none=True)

{'type': 'SequenceReference',
 'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'}

However, many additional metadata fields are available for use:

In [7]:
seqref.model_dump()

{'id': None,
 'type': 'SequenceReference',
 'label': None,
 'description': None,
 'alternativeLabels': None,
 'extensions': None,
 'mappings': None,
 'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI',
 'residueAlphabet': None,
 'circular': None}

This is the minimal representation of the reference; however, additional content may be helpful for describing this sequence when it is received. First, we can look up some metadata for it using SeqRepo:

In [8]:
ref_namespaced_id = f'ga4gh:{seqref.refgetAccession}'
seqrepo_dataproxy.get_metadata(ref_namespaced_id)

{'added': '2016-08-24T08:25:20Z',
 'aliases': ['Ensembl:5',
  'ensembl:5',
  'GRCh38:5',
  'GRCh38:chr5',
  'GRCh38.p1:5',
  'GRCh38.p1:chr5',
  'GRCh38.p10:5',
  'GRCh38.p10:chr5',
  'GRCh38.p11:5',
  'GRCh38.p11:chr5',
  'GRCh38.p12:5',
  'GRCh38.p12:chr5',
  'GRCh38.p2:5',
  'GRCh38.p2:chr5',
  'GRCh38.p3:5',
  'GRCh38.p3:chr5',
  'GRCh38.p4:5',
  'GRCh38.p4:chr5',
  'GRCh38.p5:5',
  'GRCh38.p5:chr5',
  'GRCh38.p6:5',
  'GRCh38.p6:chr5',
  'GRCh38.p7:5',
  'GRCh38.p7:chr5',
  'GRCh38.p8:5',
  'GRCh38.p8:chr5',
  'GRCh38.p9:5',
  'GRCh38.p9:chr5',
  'MD5:f7f05fb7ceea78cbc32ce652c540ff2d',
  'NCBI:NC_000005.10',
  'refseq:NC_000005.10',
  'SEGUID:TuMsXqT81pQNOh4t8oKmnG9F9xM',
  'SHA1:4ee32c5ea4fcd6940d3a1e2df282a69c6f45f713',
  'VMC:GS_aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI',
  'sha512t24u:aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI',
  'ga4gh:SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'],
 'alphabet': 'ACGNT',
 'length': 181538259}

We can use some of these data to annotate our sequence reference:

In [9]:
seqref.id = seqrepo_dataproxy.translate_sequence_identifier(ref_namespaced_id, "refseq")[0]
seqref.label = seqrepo_dataproxy.translate_sequence_identifier(ref_namespaced_id, "GRCh38")[0]
seqref.alternativeLabels = seqrepo_dataproxy.translate_sequence_identifier(ref_namespaced_id, "GRCh38")[1:]
seqref.model_dump(exclude_none=True)

{'id': 'refseq:NC_000005.10',
 'type': 'SequenceReference',
 'label': 'GRCh38:5',
 'alternativeLabels': ['GRCh38:chr5'],
 'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'}

These changes then work their way back up to the parent models:

In [10]:
seqloc.model_dump(exclude_none=True)

{'id': 'ga4gh:SL.nA5-KovovkH-5p3LF1657nkkeWFwrInI',
 'type': 'SequenceLocation',
 'digest': 'nA5-KovovkH-5p3LF1657nkkeWFwrInI',
 'sequenceReference': {'id': 'refseq:NC_000005.10',
  'type': 'SequenceReference',
  'label': 'GRCh38:5',
  'alternativeLabels': ['GRCh38:chr5'],
  'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'},
 'start': 80656509,
 'end': 80656510}

In [11]:
allele.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'type': 'Allele',
 'digest': 'LK_4rOVxyEwrEpaOVd-BDFV0ocbO5vgV',
 'location': {'id': 'ga4gh:SL.nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'type': 'SequenceLocation',
  'digest': 'nA5-KovovkH-5p3LF1657nkkeWFwrInI',
  'sequenceReference': {'id': 'refseq:NC_000005.10',
   'type': 'SequenceReference',
   'label': 'GRCh38:5',
   'alternativeLabels': ['GRCh38:chr5'],
   'refgetAccession': 'SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI'},
  'start': 80656509,
  'end': 80656510},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'TT'}}