# Ontology Exercise 

This is a notebook working through the ontology exercise I was assigned as part of a programming exercise. For more general information, please see the README at the base of this repository.  

Our first task will be to import the libraries we want to work with. Beyond the standard `pandas` import, we will also be using the `rdfpandas` library so we can convert our input Turtle (.ttl) file to a DataFrame. Also, we were asked to use SPARQL to get IDs and labels, so we will need to use the `rdflib` library. We'll be querying APIs so naturally we will be dealing with JSON data. Ultimately, this exercise will produce a CSV output file.

In [494]:
import rdflib
from rdfpandas.graph import to_dataframe
import pandas as pd
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import urllib
import json
import os

## Step 1: Read in Ontology File
First we'll navigate so that our working directory is where our input file is. Then we'll be able to read in the file.

In [8]:
os.getcwd()

'/home/zelgius/Github/ontology-exercise/documentation'

In [9]:
os.chdir('../input')

## Step 2: Load into Data Structure
Scanning [`rdfpandas` documentation](https://github.com/cadmiumkitty/rdfpandas) quickly, we see the syntax is very similar to that of `rdflib`, and we are merely using an extra step with the function `to_dataframe()`. Let's convert the provided turtle file to a DataFrame:

In [23]:
g = rdflib.Graph()
g.parse('programming_exercise.skos.ttl', format = 'ttl')
df = to_dataframe(g)

Let's have an overview of our DataFrame:

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 437 entries, http://purl.obolibrary.org/obo/DOID_0040098 to https://localhost:8443/ontology/Applicanttest/APPLICANTTEST
Data columns (total 21 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   rdf:type{URIRef}               437 non-null    object
 1   skos:altLabel{Literal}[0]@en   337 non-null    object
 2   skos:altLabel{Literal}[1]@en   207 non-null    object
 3   skos:altLabel{Literal}[2]@en   111 non-null    object
 4   skos:altLabel{Literal}[3]@en   57 non-null     object
 5   skos:altLabel{Literal}[4]@en   29 non-null     object
 6   skos:altLabel{Literal}[5]@en   20 non-null     object
 7   skos:altLabel{Literal}[6]@en   14 non-null     object
 8   skos:altLabel{Literal}[7]@en   8 non-null      object
 9   skos:altLabel{Literal}[8]@en   6 non-null      object
 10  skos:altLabel{Literal}[9]@en   5 non-null      object
 11  skos:altLabel{Literal}[10]@en 

In [20]:
df[0:20]

Unnamed: 0,rdf:type{URIRef},skos:altLabel{Literal}[0]@en,skos:altLabel{Literal}[1]@en,skos:altLabel{Literal}[2]@en,skos:altLabel{Literal}[3]@en,skos:altLabel{Literal}[4]@en,skos:altLabel{Literal}[5]@en,skos:altLabel{Literal}[6]@en,skos:altLabel{Literal}[7]@en,skos:altLabel{Literal}[8]@en,...,skos:altLabel{Literal}[10]@en,skos:altLabel{Literal}[11]@en,skos:broader{URIRef}[0],skos:broader{URIRef}[1],skos:definition{Literal},skos:inScheme{URIRef},skos:prefLabel{Literal}@en,dcterms:title{Literal}@en,skos:hasTopConcept{URIRef},skos:topConceptOf{URIRef}
http://purl.obolibrary.org/obo/DOID_0040098,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_0060039,,An autoimmune disease of skin and connective t...,https://localhost:8443/ontology/Applicanttest/...,pemphigus gestationis,,,
http://purl.obolibrary.org/obo/DOID_0050072,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A primary systemic mycosis that is a fungal in...,https://localhost:8443/ontology/Applicanttest/...,adiaspiromycosis,,,
http://purl.obolibrary.org/obo/DOID_0050096,skos:Concept,dermatophytosis of beard,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A dermatophytosis that results_in fungal infec...,https://localhost:8443/ontology/Applicanttest/...,tinea barbae,,,
http://purl.obolibrary.org/obo/DOID_0050116,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_12179,,A tinea corporis that results_in fungal infect...,https://localhost:8443/ontology/Applicanttest/...,tinea imbricata,,,
http://purl.obolibrary.org/obo/DOID_0050135,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A fungal infectious disease that results_in in...,https://localhost:8443/ontology/Applicanttest/...,subcutaneous mycosis,,,
http://purl.obolibrary.org/obo/DOID_0050169,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A lupus erythematosus that causes skin lesions...,https://localhost:8443/ontology/Applicanttest/...,cutaneous lupus erythematosus,,,
http://purl.obolibrary.org/obo/DOID_0050185,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A skin disease that is a type of allergic reac...,https://localhost:8443/ontology/Applicanttest/...,erythema multiforme,,,
http://purl.obolibrary.org/obo/DOID_0050251,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A parasitic helminthiasis infectious disease t...,https://localhost:8443/ontology/Applicanttest/...,coenurosis,,,
http://purl.obolibrary.org/obo/DOID_0050260,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_37,,A parasitic helminthiasis infectious disease t...,https://localhost:8443/ontology/Applicanttest/...,dioctophymiasis,,,
http://purl.obolibrary.org/obo/DOID_0050278,skos:Concept,,,,,,,,,,...,,,http://purl.obolibrary.org/obo/DOID_0050135,,A subcutaneous mycosis that involves a chronic...,https://localhost:8443/ontology/Applicanttest/...,basidiobolomycosis,,,


## Step 3: Extract IDs and Preferred Labels 

Now we need to extract IDs and preferred labels from our file, with the caveat that the concepts _must_ be within the "skin cancer" branch. From a quick query of OBO, we see this ID is [`http://purl.obolibrary.org/obo/DOID_4159`](http://www.ontobee.org/ontology/DOID?iri=http://purl.obolibrary.org/obo/DOID_4159). Thus, we only want concepts whose parent (in this case, the predicate is `skos:broader`) is DOID:4159. We can use SPARQL to filter the data.

In [495]:
results = g.query("""
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    SELECT ?concept ?label
    WHERE {
    
    ?concept skos:prefLabel ?label .
    ?concept skos:broader* <http://purl.obolibrary.org/obo/DOID_4159> .
    }
""")

print('Results!')
print(type(results))
for row in results:
    print(row)

Results!
<class 'rdflib.plugins.sparql.processor.SPARQLResult'>
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_4159'), rdflib.term.Literal('skin cancer', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_13168'), rdflib.term.Literal('prepuce cancer', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_8923'), rdflib.term.Literal('skin melanoma', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_6367'), rdflib.term.Literal('acral lentiginous melanoma', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_10047'), rdflib.term.Literal('nodular malignant melanoma', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_10054'), rdflib.term.Literal('skin amelanotic melanoma', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_10044'), rdflib.term.Literal('balloon cell malignant melanoma', lang='en'))
(rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_10040'), rdflib.term.Li

### Step 3: Results
Our results look accurate (we can double check by browsing the "Skin cancer" hierarchy on OBO to make sure we got all the grandchildren).  

However, you'll notice the objects returned are a special type, "SPARQLResult." We'll need to get this back into Python data types and into another DataFrame. We can look to GitHub, specifically the [SPARQLWrapper repo issues](https://github.com/RDFLib/sparqlwrapper/issues/125) for some function ideas. Here's one solution:

In [496]:
from pandas import DataFrame
from rdflib.plugins.sparql.processor import SPARQLResult

def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
    """
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    """
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )

df1 = sparql_results_to_df(results)

In [497]:
df1

Unnamed: 0,concept,label
0,http://purl.obolibrary.org/obo/DOID_4159,skin cancer
1,http://purl.obolibrary.org/obo/DOID_13168,prepuce cancer
2,http://purl.obolibrary.org/obo/DOID_8923,skin melanoma
3,http://purl.obolibrary.org/obo/DOID_6367,acral lentiginous melanoma
4,http://purl.obolibrary.org/obo/DOID_10047,nodular malignant melanoma
...,...,...
75,http://purl.obolibrary.org/obo/DOID_6446,ceruminous adenocarcinoma
76,http://purl.obolibrary.org/obo/DOID_4284,anal margin carcinoma
77,http://purl.obolibrary.org/obo/DOID_7708,perianal skin Paget's disease
78,http://purl.obolibrary.org/obo/DOID_12239,anal margin squamous cell carcinoma


This is much more readable. However, we have Concept URIs, where we'd prefer to have DOIDs we can pass to the API. So let's do so regex on the values:

In [498]:
df1['concept'] = df1['concept'].str.replace(r'_', ':', regex=True)
df1['concept'] = df1['concept'].str.replace(r'^http://purl.obolibrary.org/obo/', '', regex=True)
df1['MONDO_IRI'] = ''
df1

Unnamed: 0,concept,label,MONDO_IRI
0,DOID:4159,skin cancer,
1,DOID:13168,prepuce cancer,
2,DOID:8923,skin melanoma,
3,DOID:6367,acral lentiginous melanoma,
4,DOID:10047,nodular malignant melanoma,
...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,
76,DOID:4284,anal margin carcinoma,
77,DOID:7708,perianal skin Paget's disease,
78,DOID:12239,anal margin squamous cell carcinoma,


# Step 4: Retrieve Mappings to Other Ontologies

Now that we have our IDs and preferred labels, we need to get mappings to other ontologies.  

For our Preferred Labels, we will get mappings to MeSH and EFO via the [EBI OLS API](https://www.ebi.ac.uk/ols/docs/api). But looking into it, we can first query all ontologies to see what we have access to. We can see that MeSH is not available. When we use OXO, we see why this is perhaps the case: it is considered a type "Database" and not "Ontology." Therefore, if our calls to OLS try to specify MeSH as an ontology, it won't return anything.  

A compromise we can make in this case is using MONDO. This might be desirable because: 
* From testing some API calls, the MONDO results have very high scoring (about as high as EFO hits)
* [MONDO data sources](https://mondo.monarchinitiative.org/pages/sources/) seem encouraging, and have MeSH as a data source

For our IDs, we will get mappings to MeSH and EFO via the [EBI OXO API](https://www.ebi.ac.uk/spot/oxo/docs/api). After trying some sample queries, they can be a bit messy, so we will have to parse that JSON carefully to get the right hits.

## First up: OLS Exploration

We want to explore the API a bit and see what kind of responses we get. Let's use `requests` to make a sample call for the term "anal margin carcinoma." while specifying MONDO

In [343]:
url = "http://www.ebi.ac.uk/ols/api/select?q=anal%20margin%20carcinoma&queryFields=label&ontology=mondo&fieldList=id,iri,label,score,synonym"
response = requests.get(url).json()

In [344]:
response['response']

{'numFound': 3,
 'start': 0,
 'maxScore': 10.319454,
 'docs': [{'id': 'mondo:class:http://purl.obolibrary.org/obo/MONDO_0002941',
   'iri': 'http://purl.obolibrary.org/obo/MONDO_0002941',
   'label': 'anal margin carcinoma',
   'score': 10.319454},
  {'id': 'mondo:class:http://purl.obolibrary.org/obo/MONDO_0002940',
   'iri': 'http://purl.obolibrary.org/obo/MONDO_0002940',
   'label': 'anal margin basal cell carcinoma',
   'score': 0.010781731},
  {'id': 'mondo:class:http://purl.obolibrary.org/obo/MONDO_0001470',
   'iri': 'http://purl.obolibrary.org/obo/MONDO_0001470',
   'label': 'anal margin squamous cell carcinoma',
   'score': 0.010781731}]}

In [345]:
response['response']['docs'][0]['iri']

'http://purl.obolibrary.org/obo/MONDO_0002941'

In [499]:
labels = df1['label']
print(labels)

0                             skin cancer
1                          prepuce cancer
2                           skin melanoma
3              acral lentiginous melanoma
4              nodular malignant melanoma
                     ...                 
75              ceruminous adenocarcinoma
76                  anal margin carcinoma
77          perianal skin Paget's disease
78    anal margin squamous cell carcinoma
79                 labia majora carcinoma
Name: label, Length: 80, dtype: object


In [500]:
iris = []
for label in labels:
    encodedLabel = urllib.parse.quote(label)
    url = "http://www.ebi.ac.uk/ols/api/select?q="+encodedLabel+"&queryFields=label&ontology=mondo&fieldList=id,iri,label,score,synonym"
    response = requests.get(url).json()
    if response['response']['numFound'] == 1:
        iri_hit = [i['iri'] for i in response["response"]["docs"]]
        iri_hit = str(iri_hit)[2:-2]
        iris.append(iri_hit.replace("http://purl.obolibrary.org/obo/", ""))
    elif response['response']['numFound'] > 1:
        iri_hits = [i['iri'] for i in response["response"]["docs"]]
        iri_hit = iri_hits[0]
        iris.append(iri_hit.replace("http://purl.obolibrary.org/obo/", ""))
    else:
        iri_hit = "No result"
        iris.append(iri_hit)
df1['MONDO_IRI'] = iris

In [501]:
df1

Unnamed: 0,concept,label,MONDO_IRI
0,DOID:4159,skin cancer,MONDO_0002898
1,DOID:13168,prepuce cancer,MONDO_0001653
2,DOID:8923,skin melanoma,MONDO_0005208
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865
4,DOID:10047,nodular malignant melanoma,MONDO_0000930
...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941
77,DOID:7708,perianal skin Paget's disease,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470


In [502]:
iris = []
for label in labels:
    encodedLabel = urllib.parse.quote(label)
    url = "http://www.ebi.ac.uk/ols/api/select?q="+encodedLabel+"&queryFields=label&ontology=efo&fieldList=id,iri,label,score,synonym"
    response = requests.get(url).json()
    if response['response']['numFound'] == 1:
        iri_hit = [i['iri'] for i in response["response"]["docs"]]
        iri_hit = str(iri_hit)[2:-2]
        iris.append(iri_hit.replace("http://www.ebi.ac.uk/efo/", ""))
    elif response['response']['numFound'] > 1:
        iri_hits = [i['iri'] for i in response["response"]["docs"]]
        iri_hit = iri_hits[0]
        iris.append(iri_hit.replace("http://www.ebi.ac.uk/efo/", ""))
    else:
        iri_hit = "No result"
        iris.append(iri_hit)
df1['EFO_IRI'] = iris

In [503]:
df1

Unnamed: 0,concept,label,MONDO_IRI,EFO_IRI
0,DOID:4159,skin cancer,MONDO_0002898,http://purl.obolibrary.org/obo/MONDO_0002898
1,DOID:13168,prepuce cancer,MONDO_0001653,No result
2,DOID:8923,skin melanoma,MONDO_0005208,EFO_0000389
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865,http://purl.obolibrary.org/obo/MONDO_0003865
4,DOID:10047,nodular malignant melanoma,MONDO_0000930,EFO_0008515
...,...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941,No result
77,DOID:7708,perianal skin Paget's disease,No result,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470,No result


In [504]:
df1['EFO_IRI'] = df1.EFO_IRI.str.replace(r'^http://purl.obolibrary.org/obo/.+', r'No result', regex=True)

In [505]:
df1

Unnamed: 0,concept,label,MONDO_IRI,EFO_IRI
0,DOID:4159,skin cancer,MONDO_0002898,No result
1,DOID:13168,prepuce cancer,MONDO_0001653,No result
2,DOID:8923,skin melanoma,MONDO_0005208,EFO_0000389
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865,No result
4,DOID:10047,nodular malignant melanoma,MONDO_0000930,EFO_0008515
...,...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941,No result
77,DOID:7708,perianal skin Paget's disease,No result,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470,No result


In [506]:
df1["OLS_Mappings"] = df1[['MONDO_IRI', 'EFO_IRI']].agg('; '.join, axis=1)
df1['OLS_Mappings'] = df1['OLS_Mappings'].str.replace('No result; ', '')
df1['OLS_Mappings'] = df1['OLS_Mappings'].str.replace('; No result', '')

In [507]:
df1

Unnamed: 0,concept,label,MONDO_IRI,EFO_IRI,OLS_Mappings
0,DOID:4159,skin cancer,MONDO_0002898,No result,MONDO_0002898
1,DOID:13168,prepuce cancer,MONDO_0001653,No result,MONDO_0001653
2,DOID:8923,skin melanoma,MONDO_0005208,EFO_0000389,MONDO_0005208; EFO_0000389
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865,No result,MONDO_0003865
4,DOID:10047,nodular malignant melanoma,MONDO_0000930,EFO_0008515,MONDO_0000930; EFO_0008515
...,...,...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result,No result,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941,No result,MONDO_0002941
77,DOID:7708,perianal skin Paget's disease,No result,No result,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470,No result,MONDO_0001470


## Next: OXO Exploration 

Now we'd like to query the OXO service using our list of IDs. Our targets are MeSH and EFO. So let's dive in.

In [381]:
# Sample call for 'skin melanoma'
url = "https://www.ebi.ac.uk/spot/oxo/api/mappings?fromId=DOID:8923"
response = requests.get(url).json()

In [378]:
# To find MeSH terms
for r in response['_embedded']['mappings']:
    if r['toTerm']['datasource']['prefix'] == 'MeSH':
        print(r['toTerm']['uri'])

http://identifiers.org/MeSH:C562393
http://identifiers.org/MeSH:C562393


In [380]:
# To find EFO terms
for r in response['_embedded']['mappings']:
    if r['fromTerm']['datasource']['prefix'] == 'EFO':
        print(r['fromTerm']['uri'])

http://www.ebi.ac.uk/efo/EFO_0000389


In [508]:
ids = df1['concept']
print(ids[0:20])

0      DOID:4159
1     DOID:13168
2      DOID:8923
3      DOID:6367
4     DOID:10047
5     DOID:10054
6     DOID:10044
7     DOID:10040
8      DOID:3451
9      DOID:4871
10     DOID:3450
11     DOID:6425
12     DOID:3965
13     DOID:2513
14     DOID:4300
15     DOID:4304
16     DOID:4303
17     DOID:4302
18     DOID:4301
19     DOID:4283
Name: concept, dtype: object


In [509]:
meshes = []
efos = []
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

for i in ids:
    url = "https://www.ebi.ac.uk/spot/oxo/api/mappings?fromId="+i
    response = session.get(url).json()
    try:
        mesh_hits = [i['toTerm']['curie'] for i in response['_embedded']['mappings'] if i['toTerm']['datasource']['prefix'] == 'MeSH']
        efo_hits = [i['fromTerm']['curie'] for i in response['_embedded']['mappings'] if i['fromTerm']['datasource']['prefix'] == 'EFO']
        efo_hits = str(efo_hits)[2:-2]
        meshes.append(mesh_hits[0])
        efos.append(efo_hits)
    except:
        meshes.append('No result')
        efos.append('No result')

In [510]:
df1['MESH_ID'] = meshes
df1['EFO_ID'] = efos

In [511]:
df1

Unnamed: 0,concept,label,MONDO_IRI,EFO_IRI,OLS_Mappings,MESH_ID,EFO_ID
0,DOID:4159,skin cancer,MONDO_0002898,No result,MONDO_0002898,MeSH:D012878,EFO:0004198
1,DOID:13168,prepuce cancer,MONDO_0001653,No result,MONDO_0001653,No result,No result
2,DOID:8923,skin melanoma,MONDO_0005208,EFO_0000389,MONDO_0005208; EFO_0000389,MeSH:C562393,EFO:0000389
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865,No result,MONDO_0003865,No result,No result
4,DOID:10047,nodular malignant melanoma,MONDO_0000930,EFO_0008515,MONDO_0000930; EFO_0008515,No result,No result
...,...,...,...,...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result,No result,No result,No result,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941,No result,MONDO_0002941,No result,No result
77,DOID:7708,perianal skin Paget's disease,No result,No result,No result,No result,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470,No result,MONDO_0001470,No result,No result


In [512]:
df1["OXO_Mappings"] = df1[['MESH_ID', 'EFO_ID']].agg('; '.join, axis=1)
df1['OXO_Mappings'] = df1['OXO_Mappings'].str.replace('No result; ', '')
df1['OXO_Mappings'] = df1['OXO_Mappings'].str.replace('; No result', '')

In [513]:
df1

Unnamed: 0,concept,label,MONDO_IRI,EFO_IRI,OLS_Mappings,MESH_ID,EFO_ID,OXO_Mappings
0,DOID:4159,skin cancer,MONDO_0002898,No result,MONDO_0002898,MeSH:D012878,EFO:0004198,MeSH:D012878; EFO:0004198
1,DOID:13168,prepuce cancer,MONDO_0001653,No result,MONDO_0001653,No result,No result,No result
2,DOID:8923,skin melanoma,MONDO_0005208,EFO_0000389,MONDO_0005208; EFO_0000389,MeSH:C562393,EFO:0000389,MeSH:C562393; EFO:0000389
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865,No result,MONDO_0003865,No result,No result,No result
4,DOID:10047,nodular malignant melanoma,MONDO_0000930,EFO_0008515,MONDO_0000930; EFO_0008515,No result,No result,No result
...,...,...,...,...,...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result,No result,No result,No result,No result,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941,No result,MONDO_0002941,No result,No result,No result
77,DOID:7708,perianal skin Paget's disease,No result,No result,No result,No result,No result,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470,No result,MONDO_0001470,No result,No result,No result


# Step 5: Write Results to a Delimited File

Now we have our DataFrame complete with mappings, so now we just need to write out the file to CSV.

In [514]:
df_final = df1[['concept', 'label', 'OLS_Mappings', 'OXO_Mappings']]
df_final

Unnamed: 0,concept,label,OLS_Mappings,OXO_Mappings
0,DOID:4159,skin cancer,MONDO_0002898,MeSH:D012878; EFO:0004198
1,DOID:13168,prepuce cancer,MONDO_0001653,No result
2,DOID:8923,skin melanoma,MONDO_0005208; EFO_0000389,MeSH:C562393; EFO:0000389
3,DOID:6367,acral lentiginous melanoma,MONDO_0003865,No result
4,DOID:10047,nodular malignant melanoma,MONDO_0000930; EFO_0008515,No result
...,...,...,...,...
75,DOID:6446,ceruminous adenocarcinoma,No result,No result
76,DOID:4284,anal margin carcinoma,MONDO_0002941,No result
77,DOID:7708,perianal skin Paget's disease,No result,No result
78,DOID:12239,anal margin squamous cell carcinoma,MONDO_0001470,No result


In [517]:
# Un-comment to actually make the file
# df_final.to_csv('../output/output.csv')