# Inlined v. Referenced Objects
This notebook exists to explore potential benefits and issues with supporting both inlined and referenced objects.

![graphic](https://i.imgur.com/vBY3Pu8.png)

In [47]:
from ga4gh.core import ga4gh_identify
from ga4gh.vr import models
from ga4gh.vr.extras.dataproxy import SeqRepoRESTDataProxy
from ga4gh.vr.extras.translator import Translator

from tabulate import tabulate
from IPython.display import HTML, display
def make_table(td, **kwargs):
    display(HTML(tabulate(tablefmt="html", tabular_data=td, headers="firstrow", **kwargs)))

seqrepo_rest_service_url = "http://localhost:5000/seqrepo"

data_proxy = SeqRepoRESTDataProxy(base_url=seqrepo_rest_service_url)
tlr = Translator(data_proxy=data_proxy)

In [2]:
import traceback
from IPython.core.magic import register_cell_magic

@register_cell_magic
def ignore_exceptions(line, cell):
    try:
        exec(cell)
    except Exception as e:
        print(f"{type(e).__name__}: {e}")

## Review of basic operations

In [3]:
a = tlr.from_hgvs("NC_000013.11:g.32936732G>A")
a

<Allele _id=<Literal<str> ga4gh:VA.mpIbo0Vv4HT-Oh3g5SWcuzAR2mue3yL-> location=<SequenceLocation _id=None interval=<SimpleInterval end=<Literal<int> 32936732> start=<Literal<int> 32936731> type=<Literal<str> SimpleInterval>> sequence_id=<Literal<str> ga4gh:SQ._0wi-qoDrvram155UmcSC-zA5ZK4fpLT> type=<Literal<str> SequenceLocation>> state=<SequenceState sequence=<Literal<str> A> type=<Literal<str> SequenceState>> type=<Literal<str> Allele>>

In [4]:
a_dict = a.as_dict()
a_dict

{'_id': 'ga4gh:VA.mpIbo0Vv4HT-Oh3g5SWcuzAR2mue3yL-',
 'location': {'interval': {'end': 32936732,
   'start': 32936731,
   'type': 'SimpleInterval'},
  'sequence_id': 'ga4gh:SQ._0wi-qoDrvram155UmcSC-zA5ZK4fpLT',
  'type': 'SequenceLocation'},
 'state': {'sequence': 'A', 'type': 'SequenceState'},
 'type': 'Allele'}

In [5]:
a2 = models.Allele(**a_dict)
a2

<Allele _id=<Literal<str> ga4gh:VA.mpIbo0Vv4HT-Oh3g5SWcuzAR2mue3yL-> location=<SequenceLocation _id=None interval=<SimpleInterval end=<Literal<int> 32936732> start=<Literal<int> 32936731> type=<Literal<str> SimpleInterval>> sequence_id=<Literal<str> ga4gh:SQ._0wi-qoDrvram155UmcSC-zA5ZK4fpLT> type=<Literal<str> SequenceLocation>> state=<SequenceState sequence=<Literal<str> A> type=<Literal<str> SequenceState>> type=<Literal<str> Allele>>

In [6]:
a == a2

True

---
## New: ability to replace inlined objects with referenced objects
e.g., Previously, the value of Allele.location was required to be a subclass of Location. It may now be a Location subclass *or* a CURIE, which is expected to be a reference to a Location. 

In [7]:
a_inlined = tlr.from_hgvs("NC_000013.11:g.32936732G>A")
a_inlined.location._id = ga4gh_identify(a_inlined.location)
a_inlined.as_dict()

{'_id': 'ga4gh:VA.mpIbo0Vv4HT-Oh3g5SWcuzAR2mue3yL-',
 'location': {'_id': 'ga4gh:VSL.v9K0mcjQVugxTDIcdi7GBJ_R6fZ1lsYq',
  'interval': {'end': 32936732, 'start': 32936731, 'type': 'SimpleInterval'},
  'sequence_id': 'ga4gh:SQ._0wi-qoDrvram155UmcSC-zA5ZK4fpLT',
  'type': 'SequenceLocation'},
 'state': {'sequence': 'A', 'type': 'SequenceState'},
 'type': 'Allele'}

In [8]:
# copy a_inlined, and replace inlined location with referenced location
a_refd = models.Allele(**a_inlined.as_dict())
a_refd.location = a_refd.location._id
a_refd.as_dict()

{'_id': 'ga4gh:VA.mpIbo0Vv4HT-Oh3g5SWcuzAR2mue3yL-',
 'location': 'ga4gh:VSL.v9K0mcjQVugxTDIcdi7GBJ_R6fZ1lsYq',
 'state': {'sequence': 'A', 'type': 'SequenceState'},
 'type': 'Allele'}

---
## Concern: Usage complexity
When `location` can be a CURIE, clients must be aware of the structure of that value.

In [9]:
a_inlined.location.interval

<SimpleInterval end=<Literal<int> 32936732> start=<Literal<int> 32936731> type=<Literal<str> SimpleInterval>>

In [10]:
%%ignore_exceptions
a_refd.location.interval

AttributeError: '#/definitions/CURIE' object has no attribute 'interval'


#### Is this really a new problem? No and yes.

No: `Location` subclasses may have different types, so clients always needed to be type-aware.

Yes: However, those types have always been distinguished by a `type` attribute. CURIEs don't have this.

In [11]:
a_inlined.location.type

<Literal<str> SequenceLocation>

In [12]:
a_inlined.location.type == "SequenceLocation"

True

In [13]:
%%ignore_exceptions
a_refd.location.type

AttributeError: '#/definitions/CURIE' object has no attribute 'type'


In [14]:
type(a_inlined.location)

abc.SequenceLocation

In [15]:
type(a_refd.location)

python_jsonschema_objects.classbuilder.#/definitions/CURIE

# Scraps

In [51]:
import json
def pj(o): return json.dumps(o, indent=2).replace("\n","<br>")
td=[
    "attribute inlined referenced".split(),
    ["type", a_inlined.type._value, a_refd.type._value],
    ["state", pj(a_inlined.state.as_dict()), pj(a_refd.state.as_dict())],
    ["location", pj(a_inlined.location.as_dict()), a_refd.location.as_dict()],
]
make_table(td, colalign=("left","left","left"))


attribute,inlined,referenced
type,Allele,Allele
state,"{  ""sequence"": ""A"",  ""type"": ""SequenceState"" }","{  ""sequence"": ""A"",  ""type"": ""SequenceState"" }"
location,"{  ""_id"": ""ga4gh:VSL.v9K0mcjQVugxTDIcdi7GBJ_R6fZ1lsYq"",  ""interval"": {  ""end"": 32936732,  ""start"": 32936731,  ""type"": ""SimpleInterval""  },  ""sequence_id"": ""ga4gh:SQ._0wi-qoDrvram155UmcSC-zA5ZK4fpLT"",  ""type"": ""SequenceLocation"" }",ga4gh:VSL.v9K0mcjQVugxTDIcdi7GBJ_R6fZ1lsYq
