# Study of Ids and Identifiers
This notebook demonstrates the use and implications of assigned versus computed identifiers.

* **logical id (id)**: internal identifier used to refer to an object
* **business identifier (identifier)**: identifier used across systems to establish equivalence
* **identical**: two objects are identical if they refer to the same structure in memory
* **equal**: two objects are equal if they have the same value; two structures are equal if their corresponding elements are equal.

In [1]:
import pprint
import vmc
from vmc.richmodels import Interval, Locus, Allele, Haplotype, Genotype
from vmc.utils import multimap

In [2]:
sr = "NCBI:NC_000019.10"

intervals = {
    "rs429358": Interval(44908683, 44908684),
    "rs7412": Interval(44908821, 44908822),
    }

# logical ids are assigned at object creation if not explicitly defined
def create_alleles():
    return [
        Allele(sr, intervals["rs429358"], "T", identifiers=["myid:1"]),
        Allele(sr, intervals["rs429358"], "T", identifiers=["myid:2"]),
    ]

def print_alleles(id_function):
    vmc.richmodels.id_function = id_function
    alleles = create_alleles()
    identifier_id_map = multimap((i, a.id) for a in alleles for i in a.identifiers)
    id_identifier_map = multimap((a.id, i) for a in alleles for i in a.identifiers)
    doc = {
        "alleles": [a.as_dict() for a in alleles],
        "identfier_id_map": identifier_id_map,
        "id_identfier_map": id_identifier_map,
    }
    pprint.pprint(doc)

Each section generates two *equal* but not *identical* Alleles. For example, this might occur in a system analyzing an individual with the same genomic variant on maternal and parental strands.

When using an assigned (i.e., not computed) identifier, the *equivalence* between objects is not apparent. When using a computed identifier, the equivalece is readily visible in the map of ids to identifiers and identifiers to ids.

In [3]:
print_alleles("uuid")

{'alleles': [{'id': '95c34784-2296-4296-9b99-77cb29d2f7ad',
              'identifiers': ['myid:1'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'},
             {'id': '1a452731-65ee-4427-b5ec-ca0b5f72bc2a',
              'identifiers': ['myid:2'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'}],
 'id_identfier_map': {'1a452731-65ee-4427-b5ec-ca0b5f72bc2a': ['myid:2'],
                      '95c34784-2296-4296-9b99-77cb29d2f7ad': ['myid:1']},
 'identfier_id_map': {'myid:1': ['95c34784-2296-4296-9b99-77cb29d2f7ad'],
                      'myid:2': ['1a452731-65ee-4427-b5ec-ca0b5f72bc2a']}}


In [4]:
print_alleles("serial")

{'alleles': [{'id': 'VMC_000001',
              'identifiers': ['myid:1'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'},
             {'id': 'VMC_000002',
              'identifiers': ['myid:2'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'}],
 'id_identfier_map': {'VMC_000001': ['myid:1'], 'VMC_000002': ['myid:2']},
 'identfier_id_map': {'myid:1': ['VMC_000001'], 'myid:2': ['VMC_000002']}}


In [5]:
print_alleles("digest")

{'alleles': [{'id': 'DriWL4GHjx',
              'identifiers': ['myid:1'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'},
             {'id': 'DriWL4GHjx',
              'identifiers': ['myid:2'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'}],
 'id_identfier_map': {'DriWL4GHjx': ['myid:1', 'myid:2']},
 'identfier_id_map': {'myid:1': ['DriWL4GHjx'], 'myid:2': ['DriWL4GHjx']}}


In [6]:
print_alleles("ci")

{'alleles': [{'id': 'GA:s9484RoL0-BQlf1sppO7HmDriWL4GHjx',
              'identifiers': ['myid:1'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'},
             {'id': 'GA:s9484RoL0-BQlf1sppO7HmDriWL4GHjx',
              'identifiers': ['myid:2'],
              'location': {'end': 44908684, 'start': 44908683},
              'replacement': 'T',
              'seqref': 'NCBI:NC_000019.10'}],
 'id_identfier_map': {'GA:s9484RoL0-BQlf1sppO7HmDriWL4GHjx': ['myid:1',
                                                              'myid:2']},
 'identfier_id_map': {'myid:1': ['GA:s9484RoL0-BQlf1sppO7HmDriWL4GHjx'],
                      'myid:2': ['GA:s9484RoL0-BQlf1sppO7HmDriWL4GHjx']}}
