# Extract Molecular Initiating Events from AOP-Wiki for nanomaterials

- Step 1 - Setup
- Step 2 - Extracting all Molecular Initiating Events
- Step 3 - Searching the data for Molecular Initiating Events based on literature search outcome
- Step 4 - Combine, clean and export output


## Step 1 - Setup
### Imports

In [None]:
import sys

!{sys.executable} -m pip install --upgrade pip 
!{sys.executable} -m pip install watermark

try:
    import pandas as pd
except ImportError:
    !{sys.executable} -m pip install pandas
    import pandas as pd

try:
    from SPARQLWrapper import SPARQLWrapper, JSON
except ImportError:
    !{sys.executable} -m pip install sparqlwrapper
    from SPARQLWrapper import SPARQLWrapper, JSON

### Functions

In [None]:
def search(df,term):
    x = set()
    titl = []
    des = []
    for index,row in df.iterrows():
        if term.lower() in row['Title'].lower():
            titl.append(row['ID'])
            x.add(row['ID'])
        if term.lower() in row['Description'].lower():
            des.append(row['ID'])
            x.add(row['ID'])
    if x == set():
        print(term + " not found")
    if titl != []:
        print(term + " found in Title of " + str(','.join(titl)))
    if des != []:
        print(term + " found in Description of " + str(','.join(des)))
    return x

## Step 2 - Extracting all Molecular Initiating Events

In [None]:
sparqlendpoint = SPARQLWrapper("https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/")

In [12]:
sparqlquery = '''
    SELECT DISTINCT (fn:substring(?MIE_ID,4) as ?ID) (str(?MIE_Title) as ?Title) (str(?MIE_Description) as ?Description) ?MIE
    WHERE {
    ?AOP a aopo:AdverseOutcomePathway; aopo:has_molecular_initiating_event ?MIE.
    ?MIE rdfs:label ?MIE_ID; dc:description ?MIE_Description; dc:title ?MIE_Title
    }
    '''

sparqlendpoint.setQuery(sparqlquery)
sparqlendpoint.setReturnFormat(JSON)  
results = sparqlendpoint.query().convert()

df = pd.DataFrame(columns=["ID","Title","Description","URL"])
for result in results["results"]["bindings"]:
    df.loc[len(df.index)] = [result["ID"]["value"],result["Title"]["value"],result["Description"]["value"],result["MIE"]["value"]]

df = df.sort_values(by=['ID'])

display(df)

Unnamed: 0,ID,Title,Description,URL
23,1002,"Inhibition, Deiodinase 2",Taxonomic: Deiodination by DIO enzymes is know...,https://identifiers.org/aop.events/1002
24,1009,"Inhibition, Deiodinase 1",Taxonomic: Deiodination by DIO enzymes is know...,https://identifiers.org/aop.events/1009
63,112,"Antagonism, Estrogen receptor","Taxonomic applicability: Steroid receptors, in...",https://identifiers.org/aop.events/112
59,1194,"Increase, DNA damage","DNA nucleotide damage, single, and double stra...",https://identifiers.org/aop.events/1194
25,12,Acetylcholinesterase (AchE) Inhibition,AChE is present in all life stages of both ver...,https://identifiers.org/aop.events/12
17,122,"Activation, Glucocorticoid Receptor",Glucocorticoid receptor is fairly conserved ac...,https://identifiers.org/aop.events/122
32,1252,Binding to (interferes with) topoisomerase II ...,"DNA topoisomerases are ubiquitous enzymes, whi...",https://identifiers.org/aop.events/1252
33,1270,Inactivation of PPARγ,"Following pulmonary exposure, the stressor int...",https://identifiers.org/aop.events/1270
36,1386,"CYP7B activity, inhibition",CYP7B&nbsp;is known to be conserved in chimpan...,https://identifiers.org/aop.events/1386
37,1391,Activation of Cyp2E1,Taxonomic applicability:&nbsp;The Cyp2E1 gene ...,https://identifiers.org/aop.events/1391


## Step 3 - Searching the data for Molecular Initiating Events based on literature search outcome

In [18]:
searchterms = ['reactive oxygen species','ROS formation','ROS generation']

In [19]:
for item in searchterms:
    output = search(df,item)
    if not output == set():
        print(str(item) + ' gives ' + str(output))

reactive oxygen species found in Title of 1753
reactive oxygen species found in Description of 1194
reactive oxygen species gives {'1194', '1753'}
ROS formation not found
ROS generation found in Title of 1592
ROS generation gives {'1592'}


In [15]:
## some are missing, because our SPARQL query filters the one that are indicated as MIE for AOPs. We could filter to be "molecular" instead for more results
