# Relative References in JSON Schema
With recent updates to the schema build structure, GKS schemas now exist across multiple, cross-referenced files in the schema directory, including imports from other upstream sources.

To handle this, schemas need to have their '$id' attribute set.

First, a look at the situation _without_ doing so:

In [37]:
import re
from pathlib import Path
import jsonschema as js
import json
import os

root_dir = Path(os.getcwd()).parent
SCHEMA_DIR = root_dir / "schema"
vrs_jsons_path = SCHEMA_DIR / "vrs" /  "json"

In [38]:
sl_schema_filepath = vrs_jsons_path / 'SequenceLocation'
sl = {
    'sequenceReference': {
        'refgetAccession': 'SQ.9W6SPR3RMCHWCSGJLQHE6KBOD285V5SW',
        'type':'SequenceReference'
    },
    'start': 100,
    'end': [None, 150],
    'type': 'SequenceLocation'
}

with open(sl_schema_filepath, 'r') as sl_js_file:
    sl_schema = json.load(sl_js_file)
js.validate(sl, sl_schema)

_WrappedReferencingError: Unresolvable: SequenceReference

The `_WrappedReferencingError` informs us that, without an anchor provided by `$id`, there is no way to resolve the relative references. This makes sense, since the system has no idea where the loaded JSON came from!

We can address this by using the `referencing` library and pre-loading all of our schemas:

In [39]:
from referencing import Registry, Resource
from referencing.jsonschema import DRAFT202012
from jsonschema import Draft202012Validator
import re

In [58]:
ga4gh_re = re.compile(r'.*\/ga4gh\/schema\/([\w\-\.]+)\/[\w\.]+\/(.*)$')

def retrieve_rel_ref(ga4gh_ref: str):
    ga4gh_match = ga4gh_re.match(ga4gh_ref)
    if ga4gh_match is None:
        raise ValueError(f'ga4gh_ref {ga4gh_ref} is not a root GA4GH reference')
    schema_module = ga4gh_match.group(1)
    local_path = ga4gh_match.group(2)
    resolved_path = (SCHEMA_DIR / schema_module / local_path).resolve()
    schema = json.loads(resolved_path.read_text())
    return Resource.from_contents(schema)


vrs_js_registry = Registry(retrieve=retrieve_rel_ref)
vrs_js = dict()
vrs_validator = dict()

for schema_path in vrs_jsons_path.glob('*'):
    content = json.loads(schema_path.read_text())
    schema_uri = schema_path.as_uri()
    content['id'] = schema_uri
    schema_resource = Resource(contents=content, specification=DRAFT202012)
    vrs_js[schema_path.stem] = content
    vrs_schemas = vrs_js_registry.with_resources([
        (schema_path.name, schema_resource),
        (schema_uri, schema_resource)
    ])

for cls in vrs_js:
    vrs_validator[cls] = Draft202012Validator(vrs_js[cls], registry=vrs_js_registry)

Now we can validate! Calls to refs from the JSONs directory are retrievable from the registry, and relative refs outside of the VRS JSONs are handled by the `retrieve_rel_ref` method.

In [59]:
vrs_validator['SequenceLocation'].validate(sl)

In [60]:
a = {
    'location': sl,
    'state': {
        'type': 'ReferenceLengthExpression',
        'length': [32, 35],
        'repeatSubunitLength': 3
    },
    'type': 'Allele'
}
vrs_validator['Allele'].validate(a)