# Relative References in JSON Schema
With recent updates to the schema build structure, GKS schemas now exist across multiple, cross-referenced files in the schema directory, including imports from other upstream sources.

To handle this, schemas need to have their '$id' attribute set.

First, a look at the situation _without_ doing so:

In [15]:
from pathlib import Path
import jsonschema as js
import json
import os

root_dir = Path(os.getcwd()).parent
schema_dir = root_dir / "schema"
vrs_jsons_path = schema_dir / "json"

In [17]:
sl_schema_filepath = vrs_jsons_path / 'SequenceLocation.json'
sl = {
    'sequenceReference': {
        'refgetAccession': 'SQ.9W6SPR3RMCHWCSGJLQHE6KBOD285V5SW',
        'type':'SequenceReference'
    },
    'start': 100,
    'end': [None, 150],
    'type': 'SequenceLocation'
}

with open(sl_schema_filepath, 'r') as sl_js_file:
    sl_schema = json.load(sl_js_file)
js.validate(sl, sl_schema)

_WrappedReferencingError: Unresolvable: SequenceReference.json

The `_WrappedReferencingError` informs us that, without an anchor provided by `$id`, there is no way to resolve the relative references. This makes sense, since the system has no idea where the loaded JSON came from!

We can address this by using the `referencing` library and pre-loading all of our schemas:

In [69]:
from referencing import Registry, Resource
from referencing.jsonschema import DRAFT202012
from jsonschema import Draft202012Validator

def retrieve_rel_ref(rel_ref: str):
    resolved_path = (vrs_jsons_path / rel_ref).resolve()
    schema = json.loads(resolved_path.read_text())
    return Resource.from_contents(schema)

vrs_js_registry = Registry(retrieve=retrieve_rel_ref)
vrs_js = dict()
vrs_validator = dict()

for schema_path in vrs_jsons_path.glob('*.json'):
    content = json.loads(schema_path.read_text())
    schema_uri = schema_path.as_uri()
    content['id'] = schema_uri
    schema_resource = Resource(contents=content, specification=DRAFT202012)
    vrs_js[schema_path.stem] = content
    vrs_schemas = vrs_js_registry.with_resources([
        (schema_path.name, schema_resource),
        (schema_uri, schema_resource)
    ])

for cls in vrs_js:
    vrs_validator[cls] = Draft202012Validator(vrs_js[cls], registry=vrs_js_registry)

In [78]:
vrs_validator['SequenceLocation'].validate(sl)

Now we can validate! Calls to refs from the JSONs directory are retrievable from the registry, and relative refs outside of the VRS JSONs are handled by the `retrieve_rel_ref` method.

In [80]:
a = {
    'location': sl,
    'state': {
        'type': 'ReferenceLengthExpression',
        'length': [32, 35],
        'repeatSubunitLength': 3
    },
    'type': 'Allele'
}
vrs_validator['Allele'].validate(a)

_RefResolutionError: unknown url type: 'SequenceLocation.json'

In [76]:
vrs_validator['Allele']

{'$schema': 'https://json-schema.org/draft/2020-12/schema',
 'title': 'Allele',
 'type': 'object',
 'maturity': 'draft',
 'ga4ghDigest': {'prefix': 'VA', 'keys': ['location', 'state', 'type']},
 'description': 'The state of a molecule at a Location.',
 'properties': {'id': {'type': 'string',
   'description': "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is  unique within a given system. The identified entity may have a different 'id' in a different  system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)."},
  'label': {'type': 'string',
   'description': 'A primary label for the entity.'},
  'description': {'type': 'string',
   'description': 'A free-text description of the entity.'},
  'extensions': {'type': 'array',
   'ordered': True,
   'items': {'$ref': '../import/gks-common/json/Extension.json'}},
  'type': {'type': 'string',
   'const': 'Allele',
   'default': 'Allele',
   'description': 'MUST be "Allele