# MedGen Exploration

Here we show exploration of MedGen using the ontobio lib. In particular we are examining

 - loops within MedGen (caused by the source UMLS which is extended)
 - singletons in MedGen (i.e. nodes that are not placed in a hierarchy, particularly mendelian disease nodes)

For this analysis we use a cut of medgen preserving only Disease-or-Syndrome + Neoplastic subsets subsets. See the Makefile for details.

**TODO**: rerun with latest medgen

In [1]:
# setup ontology factory
from ontobio.ontol_factory import OntologyFactory


In [2]:
## Create an ontology factory and load the Disease-or-Syndrome + Neoplastic subsets
ofactory = OntologyFactory()
ont = ofactory.create("medgen-disease-extract.json")

In [4]:
## number of classes in medgen
len(ont.nodes())

113450

In [5]:
ont.nodes()[0:20]

['http://purl.obolibrary.org/obo/UMLS_C0267771',
 'http://purl.obolibrary.org/obo/UMLS_C1290542',
 'http://purl.obolibrary.org/obo/UMLS_C4227763',
 'http://purl.obolibrary.org/obo/UMLS_CN227583',
 'http://purl.obolibrary.org/obo/UMLS_CN200190',
 'http://purl.obolibrary.org/obo/UMLS_C1857040',
 'http://purl.obolibrary.org/obo/UMLS_C1298000',
 'http://purl.obolibrary.org/obo/UMLS_C2751292',
 'http://purl.obolibrary.org/obo/UMLS_C1858273',
 'http://purl.obolibrary.org/obo/UMLS_C1848918',
 'http://purl.obolibrary.org/obo/UMLS_C0263580',
 'http://purl.obolibrary.org/obo/UMLS_C0406325',
 'http://purl.obolibrary.org/obo/UMLS_C3806334',
 'http://purl.obolibrary.org/obo/UMLS_C0346597',
 'http://purl.obolibrary.org/obo/UMLS_C0267429',
 'http://purl.obolibrary.org/obo/UMLS_C1283807',
 'http://purl.obolibrary.org/obo/UMLS_C1333957',
 'http://purl.obolibrary.org/obo/UMLS_C1859402',
 'http://purl.obolibrary.org/obo/UMLS_C1517392',
 'http://purl.obolibrary.org/obo/UMLS_C4081754']

In [6]:
## Find a particular node by name
[d] = ont.search('Acanthosis nigricans')

In [7]:
ancs = ont.ancestors(d, 'subClassOf')
ancs

{'http://purl.obolibrary.org/obo/UMLS_C0011603',
 'http://purl.obolibrary.org/obo/UMLS_C0012634',
 'http://purl.obolibrary.org/obo/UMLS_C0037274',
 'http://purl.obolibrary.org/obo/UMLS_C1333305',
 'http://purl.obolibrary.org/obo/UMLS_C1335042',
 'http://purl.obolibrary.org/obo/UMLS_C1709246',
 'http://purl.obolibrary.org/obo/UMLS_C1709247'}

In [8]:
ancs.add(d)
subont = ont.subontology(ancs, relations=['subClassOf'])


In [9]:
from ontobio.io.ontol_renderers import GraphRenderer
w = GraphRenderer.create('png')
w.outfile = "output/acanth.png"
w.write(subont, query_ids=[d])

In [None]:
w = GraphRenderer.create('png')

![img](output/acanth.png)

In [22]:
isaG = ont.get_filtered_graph(relations=['subClassOf'])
 
                               
def is_singleton(g, n):
    return len(g.predecessors(n)) == 0 and len(g.successors(n)) == 0

singletons = [n for n in isaG.nodes() if is_singleton(isaG,n)]

print(len(singletons))
[ont.label(x) for x in singletons[0:20] if ont.label(x) is not None]


98426


['Generalized enamel hypoplasia associated with ingestion of drugs',
 'Localized lipodystrophy',
 'Ectrodactyly-polydactyly syndrome',
 'Malignant tumor involving right ovary by direct extension from left ovary',
 'Hereditary hypotrichosis with recurrent skin vesicles',
 'Ichthyosis hystrix',
 'Interstitial Pregnancy',
 'Malignant neoplasm of cartilage of trachea',
 'Dietetic ileitis',
 'Subacute autoimmune thyroiditis',
 'Giant cell epulis']

In [37]:
def omim_xrefs(c):
    return [x for x in ont.xrefs(c) if x.startswith("OMIM:")]

HEM = 'HEMOPHILIA A WITH VASCULAR ABNORMALITY'
[c] = ont.search(HEM)
#singleton_omims = [x for  s in singletons for x in omim_xrefs(s) if s is not None ]
#singleton_omims[0:100]
ont.xrefs(c)
omim_xrefs(c)

['OMIM:306800']

In [51]:
omim_orphans = set()
for c in singletons:
    omim_orphans.update( omim_xrefs(c) )
omim_orphans = list(omim_orphans)
print(len(omim_orphans))
omim_orphans[0:20]

9722


['OMIM:601536',
 'OMIM:616723',
 'OMIM:180040',
 'OMIM:311895',
 'OMIM:608363',
 'OMIM:141405',
 'OMIM:609948',
 'OMIM:136350',
 'OMIM:236900',
 'OMIM:107290',
 'OMIM:616839',
 'OMIM:187330',
 'OMIM:116300',
 'OMIM:607600',
 'OMIM:109740',
 'OMIM:115400',
 'OMIM:612163',
 'OMIM:606693',
 'OMIM:104225',
 'OMIM:300226']

In [36]:
c

'http://purl.obolibrary.org/obo/UMLS_C1844137'

## Load mondo

First we map the PURLs used in MonDO to the CURIEs used in MedGen

In [65]:
import prefixcommons
prefixcommons.curie_util.default_curie_maps
prefixcommons.curie_util.default_curie_maps[0]['OMIM'] = 'http://purl.obolibrary.org/obo/OMIM_'
prefixcommons.curie_util.expand_uri("OMIM:1")

'http://purl.obolibrary.org/obo/OMIM_1'

In [66]:
mondo = ofactory.create("mondo.json")

## Finding an OMIM class in MonDO

we define a lookup function

In [67]:
def lookup(omim_id):
    if omim_id in mondo.get_graph().nodes():
        return [omim_id]
    else:
        if omim_id in mondo.xref_graph:
            return mondo.xref_graph.neighbors(omim_id)
        else:
            return []
lookup(omim_orphans[0])


['OMIM:601536']

In [103]:
[y for x in omim_orphans[0:10] for y in lookup(x) ]

['OMIM:601536',
 'OMIM:616723',
 'OMIM:311895',
 'OMIM:608363',
 'OMIM:141405',
 'OMIM:236900',
 'OMIM:107290']

In [104]:
## Find CMT4K in MedGen
CMT4K = 'CHARCOT-MARIE-TOOTH DISEASE, TYPE 4K'
[x] = ont.search(CMT4K)
x

'http://purl.obolibrary.org/obo/UMLS_C4225246'

In [110]:
## Check that this is an orphan
is_singleton(isaG, x)

True

In [114]:
## Lookup OMIM xrefs in MedGen
omims = omim_xrefs(x)
omims

['OMIM:616684', 'OMIM:185620']

In [126]:
## Map omim xrefs to MonDO IDs (either equivalent or close)
mondo_ids = []
for x in omims:
    mondo_ids = mondo_ids + lookup(x)
mondo_ids

['OMIM:616684']

In [117]:
ancs = mondo.traverse_nodes(mondo_ids, relations='subClassOf')
subont = mondo.subontology(ancs)
w.outfile = "output/mondo-cmt4k.png"
w.write(subont, query_ids=mondo_ids)

As can be seen, the OMIM ID for CMT4K has been *patched in* to a hierarchy woven from Orphanet and DOID

![img](output/mondo-cmt4k.png)

## Dynamic Mapping

In [128]:
Craniosynostosis = 'Craniosynostosis'
matches = ont.search(Craniosynostosis)
matches

['http://purl.obolibrary.org/obo/UMLS_C0010278',
 'http://purl.obolibrary.org/obo/UMLS_CN241055']

In [131]:
ancs = ont.traverse_nodes(matches, relations='subClassOf')
subont = ont.subontology(ancs)
w.outfile = "output/medgen-crn.png"
w.write(subont, query_ids=matches)

![img](output/medgen-crn.png)