## Using the MIRA Domain Knowledge Graph REST API

MIRA implements an approach to rapidly generate domain-specific knowledge graphs (DKGs) from primary sources such as available ontologies, in support of modeling. Though there can be different DKGs for different scientific domains in which modeling is performed, the technical APIs for DKGs are shared across domains.

Below we demonstrate the MIRA DKG REST API on an epidemiology DKG that integrates about a dozen different relevant ontologies.

### Node representation
It is important to note that nodes in the DKG are labeled using compact URIs also known as CURIEs. For example, a node representing the Infectious Disease Ontology (IDO) entry 0000556 has the label ido:0000556. Here, the `ido` prefix comes from the standard prefix defined for IDO in the Bioregistry: https://bioregistry.io/registry/ido.

In [1]:
import requests

Below we refer to the URL of a public MIRA epidemiology DKG instance. The URL is subject to change later.

In [23]:
base = "http://mira-epi-dkg-lb-dc1e19b273dedaa2.elb.us-east-1.amazonaws.com/api"
# base = "http://localhost/api"  # Local deployment

### Export lexical information from the DKG
There is a dedicated endpoint for exporting all lexical information (names, synonyms, descriptions) for each DKG node. This can be useful for systems that do information extraction from unstructured sources and attempt to do named entity recognition, normalization, and disambiguation.

In [3]:
res = requests.get(base + "/lexical")

The result is a list of lists, where elements in each list include the CURIE label of the node, its standard name, its list of synonyms and its description.

In [4]:
res.json()[20000:20005]

[{'id': 'wikidata:Q98642859',
  'name': 'electronvolt square metre per kilogram',
  'description': 'unit of total mass stopping power',
  'synonyms': []},
 {'id': 'wikidata:Q98643033',
  'name': 'joule square metre per kilogram',
  'description': 'unit of total mass stopping power',
  'synonyms': []},
 {'id': 'wikidata:Q98793302', 'name': 'quart (UK)', 'synonyms': []},
 {'id': 'wikidata:Q98793408', 'name': 'liquid quart (US)', 'synonyms': []},
 {'id': 'wikidata:Q98793687',
  'name': 'dry quart (US)',
  'description': 'unit of volume',
  'synonyms': []}]

### Structured graph pattern queries in the DKG
We next look at REST API queries to the DKG that return matches based on simple structural patterns in the graph.

In [18]:
def query(payload, **_payload):
    payload.update(_payload)
    res = requests.post(base + "/relations", json=payload)
    return res.json()

#### Find relations with a given type of source node
Example: Query for relations with Vaccine Ontology (vo) source nodes

In [6]:
query({"source_type": "vo", "limit": 2})

[{'subject': 'vo:0000000',
  'predicate': 'rdfs:subClassOf',
  'object': 'vo:0000420'},
 {'subject': 'vo:0000000',
  'predicate': 'rdfs:subClassOf',
  'object': 'vo:0000420'}]

#### Find relations with a given type of target node
Example: Query for relations with Symptom Ontology (symp) target nodes

In [7]:
query({"target_type": "symp", "limit": 2})

[{'subject': 'doid:0060859',
  'predicate': 'ro:0002452',
  'object': 'symp:0000001'},
 {'subject': 'symp:0000375',
  'predicate': 'rdfs:subClassOf',
  'object': 'symp:0000001'}]

#### Find relations between a given type of source node and target node
Example: Query for relations from Disease Ontology (doid) to Symptom Ontology (symp) nodes

In [8]:
query({"source_type": "doid", "target_type": "symp", "limit": 2})

[{'subject': 'doid:0060859',
  'predicate': 'ro:0002452',
  'object': 'symp:0000001'},
 {'subject': 'doid:0060188',
  'predicate': 'ro:0002452',
  'object': 'symp:0000001'}]

#### Find relations with a specific source node
Example: Query for relations whose start node is dientamoebiasis (doid:946).

In [9]:
query({"source_curie": "doid:946", "limit": 2})

[{'subject': 'doid:946', 'predicate': 'ro:0002452', 'object': 'symp:0019177'},
 {'subject': 'doid:946', 'predicate': 'ro:0002452', 'object': 'symp:0000570'}]

#### Find relations with a specific target node
Example: Query for relations whose target node is diarrhea (symp:0000570).

In [10]:
query({"target_curie": "symp:0000570", "limit": 2})

[{'subject': 'symp:0020011',
  'predicate': 'rdfs:subClassOf',
  'object': 'symp:0000570'},
 {'subject': 'symp:0000738',
  'predicate': 'rdfs:subClassOf',
  'object': 'symp:0000570'}]

#### Adding relation type constraints
You can expand on the examples above to add not only source/target constraints but also constraints on the types of relations that are considered. For example, one can find relations that represent taxonomical subclasses using the `rdfs:subClassOf` relation type. 

Example: Query for subclass relations of a term in the Basic Formal Ontology (bfo:0000002).

In [11]:
query({"source_curie": "bfo:0000002", "relation": "rdfs:subClassOf", "limit": 2})

[{'subject': 'bfo:0000002',
  'predicate': 'rdfs:subClassOf',
  'object': 'bfo:0000001'},
 {'subject': 'bfo:0000002',
  'predicate': 'rdfs:subClassOf',
  'object': 'bfo:0000001'}]

#### Adding constraints on path length
You can also specify the maximum path length ("number of hops") surrounding a node for a query.

Example: Find subclass relations of bfo:0000002 that are at most 2 hops away.

In [12]:
query(
    {
        "source_curie": "bfo:0000002",
        "relation": "rdfs:subClassOf",
        "relation_max_hops": 2,
        "limit": 2,
    }
)

[{'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf'],
  'object': 'bfo:0000001'},
 {'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf', 'rdfs:subClassOf'],
  'object': 'owl:Thing'}]

In [13]:
# Query for specific source + relation over a variable number of hops
query(
    {
        "source_curie": "bfo:0000002",
        "relation": "rdfs:subClassOf",
        "relation_max_hops": 0,
        "distinct": True,
    }
)

[{'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf'],
  'object': 'bfo:0000001'},
 {'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf', 'rdfs:subClassOf'],
  'object': 'owl:Thing'},
 {'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf', 'bfo:0000108'],
  'object': 'bfo:0000008'},
 {'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf', 'bfo:0000108', 'rdfs:subClassOf'],
  'object': 'bfo:0000003'},
 {'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf',
   'bfo:0000108',
   'rdfs:subClassOf',
   'rdfs:subClassOf'],
  'object': 'bfo:0000001'},
 {'subject': 'bfo:0000002',
  'predicate': ['rdfs:subClassOf',
   'bfo:0000108',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf'],
  'object': 'owl:Thing'}]

#### Querying over unconstrained path lengths
One can query for relations by setting `relation_max_hops` to 0, which will return all paths matching the given constraints irrespective of length.

Example: Get all taxonomical ancestors of a given node (doid:946). (For the sake of running this example faster, we limit the number of results to 10.)

In [24]:
query(
    {
        "source_curie": "doid:946",
        "relation": ["rdfs:subClassOf", "part_of"],
        "relation_max_hops": 0,
        "distinct": True,
        "limit": 10,
    }
)

[{'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf'],
  'object': 'doid:2789'},
 {'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf', 'rdfs:subClassOf'],
  'object': 'doid:1398'},
 {'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf', 'rdfs:subClassOf', 'rdfs:subClassOf'],
  'object': 'doid:0050117'},
 {'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf'],
  'object': 'doid:4'},
 {'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf'],
  'object': 'ogms:0000031'},
 {'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf'],
  'object': 'bfo:0000016'},
 {'subject': 'doid:946',
  'predicate': ['rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   'rdfs:subClassOf',
   '

#### Including node properties in results
You can use the `full: True` parameter to return results such that not only node CURIEs but all node properties (name, etc.) are returned as well. This should be used with care since the payload can get large in size, and is often redundant.

Example: Find relations whose target is symp:0000570 with full node details.

In [25]:
query({"target_curie": "symp:0000570", "limit": 2, "full": True})

[{'subject': {'id': 'symp:0020011',
   'name': 'bloody diarrhea',
   'type': 'class',
   'obsolete': False,
   'description': None,
   'synonyms': [],
   'alts': [],
   'xrefs': [],
   'labels': [],
   'properties': {}},
  'predicate': {'pred': 'rdfs:subClassOf',
   'source': 'symp',
   'version': '2022-10-20',
   'graph': 'http://purl.obolibrary.org/obo/symp.owl'},
  'object': {'id': 'symp:0000570',
   'name': 'diarrhea',
   'type': 'class',
   'obsolete': False,
   'description': 'Diarrhea is a feces and droppng symptom involving the abnormally frequent intestinal evacuations with more or less fluid stools.',
   'synonyms': [{'value': 'the runs', 'type': 'oboinowl:hasExactSynonym'},
    {'value': 'diarrhoea', 'type': 'oboinowl:hasExactSynonym'},
    {'value': 'loose bowels', 'type': 'oboinowl:hasExactSynonym'},
    {'value': 'loose bowel', 'type': 'oboinowl:hasExactSynonym'},
    {'value': 'bacterial gastroenteritis', 'type': 'oboinowl:hasExactSynonym'},
    {'value': 'fecal incontin