# Validation Test Data
This notebook generates validation data that appears in ga4gh/vmc/tests/validation.yaml.

In [13]:
import vmc
from vmc.serialize import serialize_vmc

---
## VMC digest
The VMC digest is merely a convention for how to apply well-known existing technology to generating a unique fingerprint of a string object.

In [2]:
vmc.digest("")

'z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'

In [3]:
vmc.digest("", digest_size=12)

'z4PhNX7vuL3xVChQ'

In [4]:
vmc.digest("ACGT")

'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'

## Regions and Locations

```
rs1142345C = NC_000006.11:g.18130918T>C
rs1800460T = NC_000006.11:g.18139228C>T
```


In [11]:
sr = vmc.models.SimpleRegion(start=18130917, end=18130918)

In [15]:
serialize_vmc(sr)

'<SimpleRegion|18130917|18130918>'

In [18]:
vmc.serialize(sr)

b'{"end":18130918,"start":18130917,"type":"SimpleRegion"}'

---
## Translating sequence identifiers to VMC sequence identifiers
Sequence lookup services are required to implement VMC operations, but the exact implementation is up to the implementer. The most important need is to translate sequence identifiers from RefSeq or other sources into VMC sequence identifiers.

In [8]:
vmc.get_vmc_sequence_identifier("RefSeq:NC_000019.10")

'VMC:GS_IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl'

In [9]:
vmc.get_vmc_sequence_identifier("NC_000019.10")

'VMC:GS_IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl'

---
## Objects

In [10]:
def dump(o): print(vmc.serialize(o)); print(vmc.digest(o)); print(vmc.ir_to_id(vmc.computed_identifier(o)))

In [11]:
interval = vmc.models.Interval(start=44908683, end=44908684)
vmc.serialize(interval)

'<Interval|44908683|44908684>'

In [12]:
location = vmc.models.Location(sequence_id="VMC:GS_IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl", interval=interval)
location.id = vmc.computed_id(location)
dump(location)

<Location|VMC:GS_IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl|<Interval|44908683|44908684>>
L1IS6jOwSUsOpKihGRcqxHul1IwbV-1s
VMC:GL_L1IS6jOwSUsOpKihGRcqxHul1IwbV-1s


In [13]:
allele = vmc.models.Allele(location_id="VMC:GL_L1IS6jOwSUsOpKihGRcqxHul1IwbV-1s", state="C")
allele.id = vmc.computed_id(allele)
dump(allele)

<Allele|VMC:GL_L1IS6jOwSUsOpKihGRcqxHul1IwbV-1s|C>
zsJuMckKGajqHCl16sxKQJtBMjGrFHHZ
VMC:GA_zsJuMckKGajqHCl16sxKQJtBMjGrFHHZ


In [14]:
haplotype = vmc.models.Haplotype(completeness="COMPLETE",
                                 allele_ids=["VMC:GA_zsJuMckKGajqHCl16sxKQJtBMjGrFHHZ", 
                                             "VMC:GA__8rLiy7YkQDNy-t536RpVFGxIDiWLr6J"])
haplotype.id = vmc.computed_id(haplotype)
dump(haplotype)

<Haplotype||COMPLETE|[VMC:GA__8rLiy7YkQDNy-t536RpVFGxIDiWLr6J;VMC:GA_zsJuMckKGajqHCl16sxKQJtBMjGrFHHZ]>
xk_4sKZKfwD7ol3H89mDShrBT3dfu5Aq
VMC:GH_xk_4sKZKfwD7ol3H89mDShrBT3dfu5Aq


In [15]:
genotype = vmc.models.Genotype(completeness="COMPLETE",
                               haplotype_ids=["VMC:GH_xk_4sKZKfwD7ol3H89mDShrBT3dfu5Aq", 
                                              "VMC:GH_xk_4sKZKfwD7ol3H89mDShrBT3dfu5Aq"])
genotype.id = vmc.computed_id(genotype)
dump(genotype)

<Genotype|COMPLETE|[VMC:GH_xk_4sKZKfwD7ol3H89mDShrBT3dfu5Aq;VMC:GH_xk_4sKZKfwD7ol3H89mDShrBT3dfu5Aq]>
Pv97fICMeVRmowtCwioFpoFmrsOkZ7es
VMC:GG_Pv97fICMeVRmowtCwioFpoFmrsOkZ7es
