# Examples and Validation Tests

This notebook is used to generate yaml schema examples, documentation examples, and validation tests (in [vrs/validation/](https://github.com/ga4gh/vrs/tree/main/validation)). The intention is to have a coherent set of tested examples. Many of the tests are intended to work up to alleles, haplotypes, and genotypes of ApoE. That is:

```
                             rs7412 
                             NC_000019.9:g.45411941
                             NC_000019.10:g.44908822
                             NM_000041.3:c.526
rs429358                        C          T
NC_000019.9:g.45412079   C   APOE-ε4    APOE-ε1
NC_000019.10:g.44908684  T   APOE-ε3    APOE-ε2
NM_000041.3:c.388
```

In [32]:
import json
import string
import yaml

from IPython.display import display, Markdown

from ga4gh.core import ga4gh_digest, ga4gh_identify, ga4gh_serialize, is_identifiable, sha512t24u
from ga4gh.vrs import __version__, models, normalize, vrs_enref, vrs_deref
__version__

'0.7.0rc4.dev4+gd806dd4'

In [2]:
def filter_dict(d) -> dict:
    """remove keys starting with underscore"""
    try:
        return {k: filter_dict(d[k])
                for k in d
                if not k.startswith("_")}
    except:
        return d
def dump_json(o) -> str:
    """return VRS object as pretty formated json (string)"""
    return json.dumps(filter_dict(o.as_dict()), indent=2, sort_keys=True)
def dump_tests(o, fns=None) -> str:
    """return VRS object with and function results as yaml test definition (string)"""
    def as_str(s) -> str:
        return s if isinstance(s, str) else s.decode()
    if fns is None:
        fns = [ga4gh_serialize]
        if is_identifiable(o):
            fns += [ga4gh_digest, ga4gh_identify]
    r = {
        "in": o.as_dict(),
        "out": {f.__name__: as_str(f(o)) for f in fns}
    }
    return yaml.dump(filter_dict({o.type._value: {"-": r}})).replace("'-':","-")

all_yaml = ""
def output(o) -> None:
    """dump as json and yaml"""
    global all_yaml
    test_yaml = dump_tests(o)
    all_yaml += test_yaml
    md = [
        "**example object**",
        "```",
        dump_json(o),
        "```",
        "",
        "**validation test**",
        "```",
        test_yaml,
        "```",
    ]
    display(Markdown("\n".join(md)))

In [3]:
# These are the names of all models for which we want examples
examples = set(n for n in models if n[0] in string.ascii_uppercase and "-" not in n)
#for e in sorted(examples): print(f"-[ ] {e}")

----
# External Data

In [4]:
from ga4gh.vrs.dataproxy import SeqRepoRESTDataProxy
seqrepo_rest_service_url = "http://localhost:5000/seqrepo"
dp = SeqRepoRESTDataProxy(base_url=seqrepo_rest_service_url)

In [5]:
def get_sequence(identifier, start=None, end=None):
    """returns sequence for given identifier, optionally limited to interbase <start, end> interval"""
    return dp.get_sequence(identifier, start, end)
def get_sequence_length(identifier):
    """return length of given sequence identifier"""
    return dp.get_metadata(identifier)["length"]
def translate_sequence_identifier(identifier, namespace):
    """return for given identifier, return *list* of equivalent identifiers in given namespace"""
    return dp.translate_sequence_identifier(identifier, namespace)

In [6]:
get_sequence_length("ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl")

58617616

In [7]:
start, end = 44908821-25, 44908822+25
get_sequence("ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl", start, end)

'CCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGTACCAGGCCGGGGC'

In [8]:
translate_sequence_identifier("GRCh38:19", "ga4gh")

['ga4gh:GS.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl']

In [9]:
translate_sequence_identifier("ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl", "GRCh38")

['GRCh38:19', 'GRCh38:chr19']

----
# Example objects

## Primitives Classes
Primitive classes do not have a "type" attribute. They make the schema easier to read/understand, and they provide some value validation to promote correctness.

In [10]:
curie = models.CURIE("ncbigene:348")
curie.for_json()

'ncbigene:348'

### HumanCytoband

In [11]:
human_cytoband = models.HumanCytoband("q13.32")
human_cytoband.for_json()

'q13.32'

### Sequence

In [12]:
sequence = models.Sequence("ACGT")
sequence.for_json()

'ACGT'

## Base Types
These classes have a "type" attribute. 

### Number

In [13]:
number = models.Number(value=55)
output(number)

**example object**
```
{
  "type": "Number",
  "value": 55
}
```

**validation test**
```
Number:
  -
    in:
      type: Number
      value: 55
    out:
      ga4gh_serialize: '{"type":"Number","value":55}'

```

### DefiniteRange

In [14]:
def_range = models.DefiniteRange(
  min=22,
  max=33)
output(def_range)

**example object**
```
{
  "max": 33,
  "min": 22,
  "type": "DefiniteRange"
}
```

**validation test**
```
DefiniteRange:
  -
    in:
      max: 33
      min: 22
      type: DefiniteRange
    out:
      ga4gh_serialize: '{"max":33,"min":22,"type":"DefiniteRange"}'

```

### Gene

In [15]:
gene = models.Gene(gene_id="ncbigene:348")
output(gene)

**example object**
```
{
  "gene_id": "ncbigene:348",
  "type": "Gene"
}
```

**validation test**
```
Gene:
  -
    in:
      gene_id: ncbigene:348
      type: Gene
    out:
      ga4gh_serialize: '{"gene_id":"ncbigene:348","type":"Gene"}'

```

### IndefiniteRange

In [16]:
def_range = models.IndefiniteRange(
  value=22,
  comparator=">=")
output(def_range)

**example object**
```
{
  "comparator": ">=",
  "type": "IndefiniteRange",
  "value": 22
}
```

**validation test**
```
IndefiniteRange:
  -
    in:
      comparator: '>='
      type: IndefiniteRange
      value: 22
    out:
      ga4gh_serialize: '{"comparator":">=","type":"IndefiniteRange","value":22}'

```

## Locations and Intervals

### SimpleInterval (DEPRECATED)

In [17]:
simple_interval = models.SimpleInterval(start=44908821, end=44908822)
output(simple_interval)

**example object**
```
{
  "end": 44908822,
  "start": 44908821,
  "type": "SimpleInterval"
}
```

**validation test**
```
SimpleInterval:
  -
    in:
      end: 44908822
      start: 44908821
      type: SimpleInterval
    out:
      ga4gh_serialize: '{"end":44908822,"start":44908821,"type":"SimpleInterval"}'

```

### SequenceInterval

In [18]:
simple_sequence_interval = models.SequenceInterval(start=models.Number(value=44908821), end=models.Number(value=44908822))
output(simple_sequence_interval)

**example object**
```
{
  "end": {
    "type": "Number",
    "value": 44908822
  },
  "start": {
    "type": "Number",
    "value": 44908821
  },
  "type": "SequenceInterval"
}
```

**validation test**
```
SequenceInterval:
  -
    in:
      end:
        type: Number
        value: 44908822
      start:
        type: Number
        value: 44908821
      type: SequenceInterval
    out:
      ga4gh_serialize: '{"end":{"type":"Number","value":44908822},"start":{"type":"Number","value":44908821},"type":"SequenceInterval"}'

```

In [19]:
complex_sequence_interval = models.SequenceInterval(
    start=models.DefiniteRange(min=44908821-100, max=44908821),
    end=models.IndefiniteRange(value=44908822, comparator=">="))
output(complex_sequence_interval)

**example object**
```
{
  "end": {
    "comparator": ">=",
    "type": "IndefiniteRange",
    "value": 44908822
  },
  "start": {
    "max": 44908821,
    "min": 44908721,
    "type": "DefiniteRange"
  },
  "type": "SequenceInterval"
}
```

**validation test**
```
SequenceInterval:
  -
    in:
      end:
        comparator: '>='
        type: IndefiniteRange
        value: 44908822
      start:
        max: 44908821
        min: 44908721
        type: DefiniteRange
      type: SequenceInterval
    out:
      ga4gh_serialize: '{"end":{"comparator":">=","type":"IndefiniteRange","value":44908822},"start":{"max":44908821,"min":44908721,"type":"DefiniteRange"},"type":"SequenceInterval"}'

```

### SequenceLocation

In [20]:
simple_sequence_location = models.SequenceLocation(
    sequence_id="ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    interval=simple_sequence_interval)
output(simple_sequence_location)

**example object**
```
{
  "interval": {
    "end": {
      "type": "Number",
      "value": 44908822
    },
    "start": {
      "type": "Number",
      "value": 44908821
    },
    "type": "SequenceInterval"
  },
  "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
  "type": "SequenceLocation"
}
```

**validation test**
```
SequenceLocation:
  -
    in:
      interval:
        end:
          type: Number
          value: 44908822
        start:
          type: Number
          value: 44908821
        type: SequenceInterval
      sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
      type: SequenceLocation
    out:
      ga4gh_digest: QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg
      ga4gh_identify: ga4gh:VSL.QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg
      ga4gh_serialize: '{"interval":{"end":{"type":"Number","value":44908822},"start":{"type":"Number","value":44908821},"type":"SequenceInterval"},"sequence_id":"IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl","type":"SequenceLocation"}'

```

In [21]:
complex_sequence_location = models.SequenceLocation(
    sequence_id="ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    interval=complex_sequence_interval)
output(complex_sequence_location)

**example object**
```
{
  "interval": {
    "end": {
      "comparator": ">=",
      "type": "IndefiniteRange",
      "value": 44908822
    },
    "start": {
      "max": 44908821,
      "min": 44908721,
      "type": "DefiniteRange"
    },
    "type": "SequenceInterval"
  },
  "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
  "type": "SequenceLocation"
}
```

**validation test**
```
SequenceLocation:
  -
    in:
      interval:
        end:
          comparator: '>='
          type: IndefiniteRange
          value: 44908822
        start:
          max: 44908821
          min: 44908721
          type: DefiniteRange
        type: SequenceInterval
      sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
      type: SequenceLocation
    out:
      ga4gh_digest: 2ZIY16gPTLbVISuREaRmb0jXGj-_IdRv
      ga4gh_identify: ga4gh:VSL.2ZIY16gPTLbVISuREaRmb0jXGj-_IdRv
      ga4gh_serialize: '{"interval":{"end":{"comparator":">=","type":"IndefiniteRange","value":44908822},"start":{"max":44908821,"min":44908721,"type":"DefiniteRange"},"type":"SequenceInterval"},"sequence_id":"IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl","type":"SequenceLocation"}'

```

### CytobandInterval

In [22]:
cytoband_interval = models.CytobandInterval(
    start=models.HumanCytoband("q13.32"),
    end=models.HumanCytoband("q13.32"))
output(cytoband_interval)

**example object**
```
{
  "end": "q13.32",
  "start": "q13.32",
  "type": "CytobandInterval"
}
```

**validation test**
```
CytobandInterval:
  -
    in:
      end: q13.32
      start: q13.32
      type: CytobandInterval
    out:
      ga4gh_serialize: '{"end":"q13.32","start":"q13.32","type":"CytobandInterval"}'

```

### ChromosomeLocation

In [23]:
chromosome_location = models.ChromosomeLocation(
    species_id="taxonomy:9606",
    chr="19",
    interval=cytoband_interval
)
output(chromosome_location)

**example object**
```
{
  "chr": "19",
  "interval": {
    "end": "q13.32",
    "start": "q13.32",
    "type": "CytobandInterval"
  },
  "species_id": "taxonomy:9606",
  "type": "ChromosomeLocation"
}
```

**validation test**
```
ChromosomeLocation:
  -
    in:
      chr: '19'
      interval:
        end: q13.32
        start: q13.32
        type: CytobandInterval
      species_id: taxonomy:9606
      type: ChromosomeLocation
    out:
      ga4gh_digest: HLH0tBIjV4Vxr_814b41hBsICouJkSN1
      ga4gh_identify: ga4gh:VCL.HLH0tBIjV4Vxr_814b41hBsICouJkSN1
      ga4gh_serialize: '{"chr":"19","interval":{"end":"q13.32","start":"q13.32","type":"CytobandInterval"},"species_id":"taxonomy:9606","type":"ChromosomeLocation"}'

```

### DerivedSequenceExpression

In [24]:
dse = models.DerivedSequenceExpression(
    location = simple_sequence_location,
    reverse_complement = False
)
output(dse)

**example object**
```
{
  "location": {
    "interval": {
      "end": {
        "type": "Number",
        "value": 44908822
      },
      "start": {
        "type": "Number",
        "value": 44908821
      },
      "type": "SequenceInterval"
    },
    "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    "type": "SequenceLocation"
  },
  "reverse_complement": false,
  "type": "DerivedSequenceExpression"
}
```

**validation test**
```
DerivedSequenceExpression:
  -
    in:
      location:
        interval:
          end:
            type: Number
            value: 44908822
          start:
            type: Number
            value: 44908821
          type: SequenceInterval
        sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
        type: SequenceLocation
      reverse_complement: false
      type: DerivedSequenceExpression
    out:
      ga4gh_serialize: '{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","reverse_complement":false,"type":"DerivedSequenceExpression"}'

```

### LiteralSequenceExpression

In [25]:
lse = models.LiteralSequenceExpression(
    sequence="ACGT"
)
output(lse)

**example object**
```
{
  "sequence": "ACGT",
  "type": "LiteralSequenceExpression"
}
```

**validation test**
```
LiteralSequenceExpression:
  -
    in:
      sequence: ACGT
      type: LiteralSequenceExpression
    out:
      ga4gh_serialize: '{"sequence":"ACGT","type":"LiteralSequenceExpression"}'

```

### RepeatedSequenceExpression

In [26]:
rse = models.RepeatedSequenceExpression(
    seq_expr = dse,
    count = models.IndefiniteRange(value=6, comparator=">=")
)
output(rse)

**example object**
```
{
  "count": {
    "comparator": ">=",
    "type": "IndefiniteRange",
    "value": 6
  },
  "seq_expr": {
    "location": {
      "interval": {
        "end": {
          "type": "Number",
          "value": 44908822
        },
        "start": {
          "type": "Number",
          "value": 44908821
        },
        "type": "SequenceInterval"
      },
      "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
      "type": "SequenceLocation"
    },
    "reverse_complement": false,
    "type": "DerivedSequenceExpression"
  },
  "type": "RepeatedSequenceExpression"
}
```

**validation test**
```
RepeatedSequenceExpression:
  -
    in:
      count:
        comparator: '>='
        type: IndefiniteRange
        value: 6
      seq_expr:
        location:
          interval:
            end:
              type: Number
              value: 44908822
            start:
              type: Number
              value: 44908821
            type: SequenceInterval
          sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
          type: SequenceLocation
        reverse_complement: false
        type: DerivedSequenceExpression
      type: RepeatedSequenceExpression
    out:
      ga4gh_serialize: '{"count":{"comparator":">=","type":"IndefiniteRange","value":6},"seq_expr":{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","reverse_complement":false,"type":"DerivedSequenceExpression"},"type":"RepeatedSequenceExpression"}'

```

## Molecular Variation

### Allele

In [27]:
# Allele with deprecated SequenceState
allele = models.Allele(location=simple_sequence_location,
                       state=models.SequenceState(sequence="T"))
output(allele)

**example object**
```
{
  "location": {
    "interval": {
      "end": {
        "type": "Number",
        "value": 44908822
      },
      "start": {
        "type": "Number",
        "value": 44908821
      },
      "type": "SequenceInterval"
    },
    "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    "type": "SequenceLocation"
  },
  "state": {
    "sequence": "T",
    "type": "SequenceState"
  },
  "type": "Allele"
}
```

**validation test**
```
Allele:
  -
    in:
      location:
        interval:
          end:
            type: Number
            value: 44908822
          start:
            type: Number
            value: 44908821
          type: SequenceInterval
        sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
        type: SequenceLocation
      state:
        sequence: T
        type: SequenceState
      type: Allele
    out:
      ga4gh_digest: KnG6BLTexv7o-j9LnYsgPxZkRUu1IRnp
      ga4gh_identify: ga4gh:VA.KnG6BLTexv7o-j9LnYsgPxZkRUu1IRnp
      ga4gh_serialize: '{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","state":{"sequence":"T","type":"SequenceState"},"type":"Allele"}'

```

In [28]:
# Allele with LiteralSequenceExpression
allele = models.Allele(location=simple_sequence_location,
                       state=models.LiteralSequenceExpression(sequence="T"))
output(allele)

**example object**
```
{
  "location": {
    "interval": {
      "end": {
        "type": "Number",
        "value": 44908822
      },
      "start": {
        "type": "Number",
        "value": 44908821
      },
      "type": "SequenceInterval"
    },
    "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    "type": "SequenceLocation"
  },
  "state": {
    "sequence": "T",
    "type": "LiteralSequenceExpression"
  },
  "type": "Allele"
}
```

**validation test**
```
Allele:
  -
    in:
      location:
        interval:
          end:
            type: Number
            value: 44908822
          start:
            type: Number
            value: 44908821
          type: SequenceInterval
        sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
        type: SequenceLocation
      state:
        sequence: T
        type: LiteralSequenceExpression
      type: Allele
    out:
      ga4gh_digest: CxiA_hvYbkD8Vqwjhx5AYuyul4mtlkpD
      ga4gh_identify: ga4gh:VA.CxiA_hvYbkD8Vqwjhx5AYuyul4mtlkpD
      ga4gh_serialize: '{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","state":{"sequence":"T","type":"LiteralSequenceExpression"},"type":"Allele"}'

```

### Haplotype

In [29]:
rs7412_38 = models.SequenceLocation(
    sequence_id="ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    interval=models.SequenceInterval(
        start=models.Number(value=44908821),
        end=models.Number(value=44908822)),
)
rs429358_38 = models.SequenceLocation(
    sequence_id="ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    interval=models.SequenceInterval(
        start=models.Number(value=44908683),
        end=models.Number(value=44908684)),
)
rs7412_38C = models.Allele(
    location=rs7412_38,
    state=models.LiteralSequenceExpression(sequence="C")
)
rs429358_38C = models.Allele(
    location=rs429358_38,
    state=models.LiteralSequenceExpression(sequence="C")
)
haplotype = models.Haplotype(members=[rs7412_38C, rs429358_38C])
output(haplotype)

**example object**
```
{
  "members": [
    {
      "location": {
        "interval": {
          "end": {
            "type": "Number",
            "value": 44908822
          },
          "start": {
            "type": "Number",
            "value": 44908821
          },
          "type": "SequenceInterval"
        },
        "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
        "type": "SequenceLocation"
      },
      "state": {
        "sequence": "C",
        "type": "LiteralSequenceExpression"
      },
      "type": "Allele"
    },
    {
      "location": {
        "interval": {
          "end": {
            "type": "Number",
            "value": 44908684
          },
          "start": {
            "type": "Number",
            "value": 44908683
          },
          "type": "SequenceInterval"
        },
        "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
        "type": "SequenceLocation"
      },
      "state": {
        "sequence": "C",
        "type": "LiteralSequenceExpression"
      },
      "type": "Allele"
    }
  ],
  "type": "Haplotype"
}
```

**validation test**
```
Haplotype:
  -
    in:
      members:
      - location:
          interval:
            end:
              type: Number
              value: 44908822
            start:
              type: Number
              value: 44908821
            type: SequenceInterval
          sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
          type: SequenceLocation
        state:
          sequence: C
          type: LiteralSequenceExpression
        type: Allele
      - location:
          interval:
            end:
              type: Number
              value: 44908684
            start:
              type: Number
              value: 44908683
            type: SequenceInterval
          sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
          type: SequenceLocation
        state:
          sequence: C
          type: LiteralSequenceExpression
        type: Allele
      type: Haplotype
    out:
      ga4gh_digest: i8owCOBHIlRCPtcw_WzRFNTunwJRy99-
      ga4gh_identify: ga4gh:VH.i8owCOBHIlRCPtcw_WzRFNTunwJRy99-
      ga4gh_serialize: '{"members":["-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x","Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"],"type":"Haplotype"}'

```

In [30]:
# haplotype normalization puts members in a canonial order
haplotype2 = models.Haplotype(members=[rs429358_38C, rs7412_38C])
assert ga4gh_identify(normalize(haplotype)) == ga4gh_identify(normalize(haplotype2))

In [36]:
object_store = {}
haplotype_ref= vrs_enref(haplotype, object_store)
output(haplotype_ref)

**example object**
```
{
  "members": [
    "ga4gh:VA.-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x",
    "ga4gh:VA.Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"
  ],
  "type": "Haplotype"
}
```

**validation test**
```
Haplotype:
  -
    in:
      members:
      - ga4gh:VA.-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x
      - ga4gh:VA.Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1
      type: Haplotype
    out:
      ga4gh_digest: i8owCOBHIlRCPtcw_WzRFNTunwJRy99-
      ga4gh_identify: ga4gh:VH.i8owCOBHIlRCPtcw_WzRFNTunwJRy99-
      ga4gh_serialize: '{"members":["-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x","Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"],"type":"Haplotype"}'

```

In [37]:
assert ga4gh_identify(haplotype) == ga4gh_identify(haplotype_ref)
ga4gh_identify(haplotype)

'ga4gh:VH.i8owCOBHIlRCPtcw_WzRFNTunwJRy99-'

## Systemic Variation

### CopyNumber

In [31]:
copy_number = models.CopyNumber(
    subject=models.Gene(gene_id="ncbigene:348"),
    copies=models.IndefiniteRange(value=3, comparator=">=")
)
output(copy_number)

**example object**
```
{
  "copies": {
    "comparator": ">=",
    "type": "IndefiniteRange",
    "value": 3
  },
  "subject": {
    "gene_id": "ncbigene:348",
    "type": "Gene"
  },
  "type": "CopyNumber"
}
```

**validation test**
```
CopyNumber:
  -
    in:
      copies:
        comparator: '>='
        type: IndefiniteRange
        value: 3
      subject:
        gene_id: ncbigene:348
        type: Gene
      type: CopyNumber
    out:
      ga4gh_digest: xksSWn--_z28Qaj-Udlhot4OKqYGkywy
      ga4gh_identify: ga4gh:VCN.xksSWn--_z28Qaj-Udlhot4OKqYGkywy
      ga4gh_serialize: '{"copies":{"comparator":">=","type":"IndefiniteRange","value":3},"subject":{"gene_id":"ncbigene:348","type":"Gene"},"type":"CopyNumber"}'

```

## Utility Variation

### Text

In [32]:
text_variation = models.Text(definition="APOE loss")
output(text_variation)

**example object**
```
{
  "definition": "APOE loss",
  "type": "Text"
}
```

**validation test**
```
Text:
  -
    in:
      definition: APOE loss
      type: Text
    out:
      ga4gh_digest: 7hhlAaPeqj-sd67nSWXl7WC1yJ-g15tp
      ga4gh_identify: ga4gh:VT.7hhlAaPeqj-sd67nSWXl7WC1yJ-g15tp
      ga4gh_serialize: '{"definition":"APOE loss","type":"Text"}'

```

### VariationSet

In [38]:
variation_set = models.VariationSet(members=[rs7412_38C, rs429358_38C])
output(variation_set)

**example object**
```
{
  "members": [
    {
      "location": {
        "interval": {
          "end": {
            "type": "Number",
            "value": 44908822
          },
          "start": {
            "type": "Number",
            "value": 44908821
          },
          "type": "SequenceInterval"
        },
        "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
        "type": "SequenceLocation"
      },
      "state": {
        "sequence": "C",
        "type": "LiteralSequenceExpression"
      },
      "type": "Allele"
    },
    {
      "location": {
        "interval": {
          "end": {
            "type": "Number",
            "value": 44908684
          },
          "start": {
            "type": "Number",
            "value": 44908683
          },
          "type": "SequenceInterval"
        },
        "sequence_id": "ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
        "type": "SequenceLocation"
      },
      "state": {
        "sequence": "C",
        "type": "LiteralSequenceExpression"
      },
      "type": "Allele"
    }
  ],
  "type": "VariationSet"
}
```

**validation test**
```
VariationSet:
  -
    in:
      members:
      - location:
          interval:
            end:
              type: Number
              value: 44908822
            start:
              type: Number
              value: 44908821
            type: SequenceInterval
          sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
          type: SequenceLocation
        state:
          sequence: C
          type: LiteralSequenceExpression
        type: Allele
      - location:
          interval:
            end:
              type: Number
              value: 44908684
            start:
              type: Number
              value: 44908683
            type: SequenceInterval
          sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl
          type: SequenceLocation
        state:
          sequence: C
          type: LiteralSequenceExpression
        type: Allele
      type: VariationSet
    out:
      ga4gh_digest: QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE
      ga4gh_identify: ga4gh:VS.QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE
      ga4gh_serialize: '{"members":["-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x","Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"],"type":"VariationSet"}'

```

In [40]:
variation_set_ref = vrs_enref(variation_set)
output(variation_set_ref)

**example object**
```
{
  "members": [
    "ga4gh:VA.-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x",
    "ga4gh:VA.Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"
  ],
  "type": "VariationSet"
}
```

**validation test**
```
VariationSet:
  -
    in:
      members:
      - ga4gh:VA.-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x
      - ga4gh:VA.Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1
      type: VariationSet
    out:
      ga4gh_digest: QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE
      ga4gh_identify: ga4gh:VS.QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE
      ga4gh_serialize: '{"members":["-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x","Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"],"type":"VariationSet"}'

```

In [41]:
assert ga4gh_identify(variation_set) == ga4gh_identify(variation_set_ref)
ga4gh_identify(variation_set)

'ga4gh:VS.QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE'

----
# Functions

### Truncated Digest (sha512t24u)

In [34]:
sha512t24u(b"")

'z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'

In [35]:
sha512t24u(b"ACGT")

'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'

### Digest Serialization (`ga4gh_serialize`)

The ga4gh digest serialization form is like json, but it the specification ensures that all implementations will produce the same binary payload.

In [36]:
allele = models.Allele(location=models.SequenceLocation(
    sequence_id="ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl",
    interval=simple_interval),
    state=models.SequenceState(sequence="T"))
ga4gh_serialize(allele)

b'{"location":"u5fspwVbQ79QkX6GHLF8tXPCAXFJqRPx","state":{"sequence":"T","type":"SequenceState"},"type":"Allele"}'

### Object Digest (`ga4gh_digest`)
VR computed identifiers are constructed from digests on serialized objects by prefixing a VR digest with a type-specific code.

In [37]:
# applying ga4gh_digest to the serialized allele returns a base64url-encoded digest
ga4gh_digest(allele)

'EgHPXXhULTwoP4-ACfs-YCXaeUQJBjH_'

In [38]:
# Which is equivalent to
sha512t24u(ga4gh_serialize(allele))

'EgHPXXhULTwoP4-ACfs-YCXaeUQJBjH_'

### Object Computed Identifier (`ga4gh_identify`)

In [39]:
ga4gh_identify(allele)

'ga4gh:VA.EgHPXXhULTwoP4-ACfs-YCXaeUQJBjH_'

# Write test yaml

In [40]:
open("/tmp/validation.yaml", "w").write(all_yaml)

9601