# Scheduled Integration of ClinGen Gene-Disease Validity Data into WikiData

ClinGen (Clinical Genome Resource) develops curated data of genetic associations <br>
CC0 https://clinicalgenome.org/docs/terms-of-use/

This scheduled bot operates through WDI to integrate ClinGen Gene-Disease Validity Data <br>
https://github.com/SuLab/GeneWikiCentral/issues/116 <br>
https://search.clinicalgenome.org/kb/gene-validity/ <br>

Python script contributions, in order: Sabah Ul-Hasan, Andra Waagmeester, Andrew Su

## Checks

- Login should automatically align with given environment 
- For loop checks for both HGNC Qid and MONDO Qid per each row (ie if HGNC absent or multiple, then checks MONDO) 
- For loop works on multiple Qid option, tested on A2ML1 and corrected afterwards
- For loop puts correct Qid for either HGNC or MONDO, if available <br> <br>
- create_reference() adds refs to existing HGNC or MONDO value in genetic association statement (doesn't overwrite URLs from non-ClinGen sources)
- **Playing around with 'update_retrieved_if_new_multiple_refs' (see for loop)... 180 days is fine <br>
how to input 'skipped' in Status as part of current function? <br>
more info https://github.com/SuLab/WikidataIntegrator/tree/master/wikidataintegrator/ref_handlers

## To Do

1) Update across entire dataframe <br>
2) Share full output file with ClinGen <br> <br>
3) Set up scheduled bot through proteinboxbot (update login) <br>
4) Run in jenkins: http://jenkins.sulab.org/

In [1]:
### Relevant modules and libraries

# Installations by shell 
!pip install --upgrade pip # Installs pip, ensures it's up-to-date
!pip3 install tqdm # Visualizes installation progress (progress bar)
!pip3 install termcolor # For color-coding printed output
!pip3 install wikidataintegrator # For wikidata

# Installations by python
from wikidataintegrator import wdi_core, wdi_login # Core and login from wikidataintegrator module
from wikidataintegrator.ref_handlers import update_retrieved_if_new_multiple_refs # For retrieving references
import copy # Copies references needed in the .csv for uploading to wikidata
from datetime import datetime # For identifying the current date and time
import time # For keeping track of total for loop run time

import os # OS package to ensure interaction between the modules (ie WDI) and current OS being used

import pandas as pd # Pandas for data organization, then abbreviated to pd
import numpy as np # Another general purpose package
from termcolor import colored # Imports colored package from termcolor

Collecting pip
  Using cached https://files.pythonhosted.org/packages/00/b6/9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
Successfully installed pip-19.3.1
Collecting tqdm
  Using cached https://files.pythonhosted.org/packages/e1/c1/bc1dba38b48f4ae3c4428aea669c5e27bd5a7642a74c8348451e0bd8ff86/tqdm-4.36.1-py2.py3-none-any.whl
Installing collected packages: tqdm
Successfully installed tqdm-4.36.1
Collecting termcolor
  Using cached https://files.pythonhosted.org/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Installing collected packages: termcolor
    Running setup.py install for termcolor ... [?25ldone
[?25hSuccessfully installed termcolor-1.1.0
Collecting wikidataintegrator
  Using cached https://files.pythonhosted.org/packages/a4/4a/bfac10031

In [5]:
### Login for running WDI

print("Logging in...") 

# Enter your own username and password ** to be updated to ProteinBoxBot
os.environ["WDUSER"] = "username" # Uses os package to call and set the environment for wikidata username
os.environ["WDPASS"] = "password"

# Conditional that outputs error command if not in the local python environment
if "WDUSER" in os.environ and "WDPASS" in os.environ: 
    WDUSER = os.environ['WDUSER']
    WDPASS = os.environ['WDPASS']
else: 
    raise ValueError("WDUSER and WDPASS must be specified in local.py or as environment variables")      

# Sets attributed username and password as 'login'
login = wdi_login.WDLogin(WDUSER, WDPASS) 

Logging in...
https://www.wikidata.org/w/api.php
Successfully logged in as Sulhasan


In [68]:
### ClinGen gene-disease validity data

# Read as csv
df = pd.read_csv('https://search.clinicalgenome.org/kb/gene-validity.csv', skiprows=6, header=None)  

# Label column headings
df.columns = ['Gene', 'HGNC Gene ID', 'Disease', 'MONDO Disease ID','SOP','Classification','Report Reference URL','Report Date']

# Create time stamp of when downloaded (error if isoformat() used)
timeStringNow = datetime.now().strftime("+%Y-%m-%dT00:00:00Z")

df.head(6) # View first 6 rows

Unnamed: 0,Gene,HGNC Gene ID,Disease,MONDO Disease ID,SOP,Classification,Report Reference URL,Report Date
0,A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:37:47.175Z
1,A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:31:03.696Z
2,A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:34:05.324Z
3,A2ML1,HGNC:23336,Noonan syndrome,MONDO_0018997,SOP5,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:23:53.157Z
4,A2ML1,HGNC:23336,Noonan syndrome-like disorder with loose anage...,MONDO_0011899,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:40:11.599Z
5,AARS,HGNC:20,undetermined early-onset epileptic encephalopathy,MONDO_0018614,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2018-11-20T17:00:00.000Z


In [69]:
### Create empty columns for output file (ignore warnings)

df['Status'] = "pending" # "Status" column with 'pending' for all cells: 'error', 'complete', 'skipped' (meaning previously logged within 180 days)
df['Definitive'] = "" # Empty cell to be replaced with 'yes' or 'no' string
df['Gene QID'] = "" # To be replaced with 'absent' or 'multiple'
df['Disease QID'] = "" # To be replaced with 'absent' or 'multiple'

df.head(6)

Unnamed: 0,Gene,HGNC Gene ID,Disease,MONDO Disease ID,SOP,Classification,Report Reference URL,Report Date,Status,Definitive,Gene QID,Disease QID
0,A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:37:47.175Z,pending,,,
1,A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:31:03.696Z,pending,,,
2,A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:34:05.324Z,pending,,,
3,A2ML1,HGNC:23336,Noonan syndrome,MONDO_0018997,SOP5,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:23:53.157Z,pending,,,
4,A2ML1,HGNC:23336,Noonan syndrome-like disorder with loose anage...,MONDO_0011899,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:40:11.599Z,pending,,,
5,AARS,HGNC:20,undetermined early-onset epileptic encephalopathy,MONDO_0018614,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2018-11-20T17:00:00.000Z,pending,,,


In [70]:
### For loop that executes the following through each row of the dataframe 

start_time = time.time() # Keep track of how long it takes loop to run

for index, row in df.iterrows(): # Index is a row number, row is all variables and values for that row
        
    # Identify the string in the Gene or Disease column for a given row
    HGNC = df.loc[index, 'Gene'] 
    MONDO = df.loc[index, 'MONDO Disease ID'].replace("_", ":") # .replace() changes _ to : for SparQL query
    
    # SparQL query to search for Gene or Disease in Wikidata based on HGNC (P353) or MONDO (P5270)
    sparqlQuery_HGNC = "SELECT * WHERE {?gene wdt:P353 \""+HGNC+"\"}" 
    result_HGNC = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery_HGNC) # Resultant query
    sparqlQuery_MONDO = "SELECT * WHERE {?disease wdt:P5270 \""+MONDO+"\"}" 
    result_MONDO = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery_MONDO)
    
    # Assign resultant length of dictionary for either Gene or Disease (number of Qid)
    HGNC_qlength = len(result_HGNC["results"]["bindings"]) 
    MONDO_qlength = len(result_MONDO["results"]["bindings"])
    
    # Conditional utilizing that value for output table 
    # Accounts for scenarios where there's no Gene Qid but is a Disease Qid, etc
    if HGNC_qlength == 1:
        HGNC_qid = result_HGNC["results"]["bindings"][0]["gene"]["value"].replace("http://www.wikidata.org/entity/", "")
        df.at[index, 'Gene QID'] = HGNC_qid # Input HGNC Qid in 'Gene QID' cell  
    if HGNC_qlength < 1: # If no Qid
        df.at[index, 'Status'] = "error" 
        df.at[index, 'Gene QID'] = "absent"  
    if HGNC_qlength > 1: # If multiple Qid
        df.at[index, 'Status'] = "error" 
        df.at[index, 'Gene QID'] = "multiple"
        
    if MONDO_qlength == 1:
        MONDO_qid = result_MONDO["results"]["bindings"][0]["disease"]["value"].replace("http://www.wikidata.org/entity/", "") 
        df.at[index, 'Disease QID'] = MONDO_qid  
    if MONDO_qlength < 1: 
        df.at[index, 'Status'] = "error" 
        df.at[index, 'Disease QID'] = "absent" 
    if MONDO_qlength > 1:
        df.at[index, 'Status'] = "error" 
        df.at[index, 'Disease QID'] = "multiple" 
        
    # Conditional inputs error for rows where the Classification != 'Definitive', then skips
    if row['Classification']!='Definitive': # If the string is NOT 'Definitive' for the Classification column
        df.at[index, 'Status'] = "error" # Then input "error" in the Status column
        df.at[index, 'Definitive'] = "no" # And'no' for Definitive column
        continue 
    else: # Otherwise
        df.at[index, 'Definitive'] = "yes" # Input 'yes' for Definitive column go on to next step
  
    # Conditional tcontinues to write into WikiData if only 1 Qid for each + Definitive classification 
    if HGNC_qlength == 1 & MONDO_qlength == 1:
        
        # Call upon create_reference() function created   
        # reference = create_reference() 
        
        # Add disease value to gene item page, and gene value to disease item page (symmetry)
        #statement_HGNC = [wdi_core.WDItemID(value=MONDO_qid, prop_nr="P2293", references=[copy.deepcopy(reference)])] # Creates 'gene assocation' statement (P2293) whether or not it's already there, and includes the references
        #wikidata_HGNCitem = wdi_core.WDItemEngine(wd_item_id=HGNC_qid, data=statement_HGNC, ref_handler=update_retrieved_if_new_multiple_refs, append_value=["P2293"])
        #wikidata_HGNCitem.get_wd_json_representation() # Gives json structure that submitted to API, helpful for debugging 
    
        #statement_MONDO = [wdi_core.WDItemID(value=HGNC_qid, prop_nr="P2293", references=[copy.deepcopy(reference)])] # Symmetry for disease item page
        #wikidata_MONDOitem = wdi_core.WDItemEngine(wd_item_id=MONDO_qid, data=statement_MONDO, ref_handler=update_retrieved_if_new_multiple_refs, append_value=["P2293"])
        #wikidata_MONDOitem.get_wd_json_representation()
    
        #print(colored(HGNC,"blue"), "Gene successfully logged as", colored(wikidata_HGNCitem.write(login),"blue"), "and", colored(MONDO,"green"), "Disease successfully logged as", colored(wikidata_MONDOitem.write(login),"green"))
        subsetdf.at[index, 'Status'] = "complete" 
        
        
end_time = time.time() # Captures when loop run ends
print("The total time of this loop is:", end_time - start_time, "seconds, or", (end_time - start_time)/60, "minutes") 

# Write output to a .csv file
now = datetime.now() # Retrieves current time and saves it as 'now'
# Includes hour:minute:second_dd-mm-yyyy time stamp (https://en.wikipedia.org/wiki/ISO_8601)
df.to_csv("ClinGenBot_Status-Output_" + now.isoformat() + ".csv")  # isoformat
df.head(20)

The total time of this loop is: 152.85655808448792 seconds, or 2.547609301408132 minutes


Unnamed: 0,Gene,HGNC Gene ID,Disease,MONDO Disease ID,SOP,Classification,Report Reference URL,Report Date,Status,Definitive,Gene QID,Disease QID
0,A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:37:47.175Z,error,no,Q18051234,absent
1,A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:31:03.696Z,error,no,Q18051234,absent
2,A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:34:05.324Z,error,no,Q18051234,Q1136492
3,A2ML1,HGNC:23336,Noonan syndrome,MONDO_0018997,SOP5,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:23:53.157Z,error,no,Q18051234,absent
4,A2ML1,HGNC:23336,Noonan syndrome-like disorder with loose anage...,MONDO_0011899,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-07T14:40:11.599Z,error,no,Q18051234,Q55783530
5,AARS,HGNC:20,undetermined early-onset epileptic encephalopathy,MONDO_0018614,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2018-11-20T17:00:00.000Z,error,no,absent,Q56014174
6,ABCC9,HGNC:60,hypertrichotic osteochondrodysplasia Cantu type,MONDO_0009406,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-09-27T00:00:00,complete,yes,Q18034993,Q5034093
7,ABCD1,HGNC:61,X-linked cerebral adrenoleukodystrophy,MONDO_0010247,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2018-02-07T14:00:00,complete,yes,Q14912808,Q55345732
8,ABHD12,HGNC:15868,PHARC syndrome,MONDO_0012984,SOP5,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2018-06-28T16:45:15.791Z,complete,yes,Q18038087,Q32137273
9,ACAD8,HGNC:87,isobutyryl-CoA dehydrogenase deficiency,MONDO_0012648,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-04-26T16:00:00.000Z,complete,yes,Q18038564,Q6085391


### Update to entire dataframe (all subsetdf to df throughout) after bot is approved as satisfactory

In [64]:
subsetdf = df[25:35] # Subset and rename as subsetdf
subsetdf

Unnamed: 0,Gene,HGNC Gene ID,Disease,MONDO Disease ID,SOP,Classification,Report Reference URL,Report Date,Status,Definitive,Gene QID,Disease QID
25,ADCY1,HGNC:232,autosomal recessive nonsyndromic deafness,MONDO_0019588,SOP4,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2017-05-10T00:00:00,pending,,,
26,ADGRV1,HGNC:17416,Usher syndrome type 2C,MONDO_0011558,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-02-15T00:00:00,pending,,,
27,ADGRV1,HGNC:17416,nonsyndromic genetic deafness,MONDO_0019497,SOP6,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2019-03-19T16:00:00.000Z,pending,,,
28,AFF2,HGNC:3776,FRAXE intellectual disability,MONDO_0010659,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-10-20T00:00:00,pending,,,
29,AGPS,HGNC:327,rhizomelic chondrodysplasia punctata,MONDO_0015776,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-10-04T20:27:42.154Z,pending,,,
30,AGTR2,HGNC:338,non-syndromic X-linked intellectual disability,MONDO_0019181,SOP4,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2016-11-16T00:00:00,pending,,,
31,AIFM1,HGNC:8768,X-linked hereditary sensory and autonomic neur...,MONDO_0010378,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2018-07-09T16:00:00.000Z,pending,,,
32,AKAP9,HGNC:379,long QT syndrome 11,MONDO_0012738,SOP4,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2016-12-15T00:00:00,pending,,,
33,ALDH4A1,HGNC:406,hyperprolinemia type 2,MONDO_0009401,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-vali...,2019-09-13T17:27:15.767Z,pending,,,
34,ALDH7A1,HGNC:877,pyridoxine-dependent epilepsy,MONDO_0009945,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-07-26T17:09:04.469Z,pending,,,


### For loop that iterates across dataframe and uploads to WikiData

In [65]:
### Create a function for adding references to then be iterated in the loop "create_reference()"

def create_reference(): 
        refStatedIn = wdi_core.WDItemID(value="Q64403342", prop_nr="P248", is_reference=True) # ClinGen Qid = Q64403342, 'stated in' Pid = P248 
        refRetrieved = wdi_core.WDTime(timeStringNow, prop_nr="P813", is_reference=True) # Calls on previous 'timeStringNow' string, 'retrieved' Pid = P813
        refURL = wdi_core.WDUrl((subsetdf.loc[index, 'Report Reference URL']), prop_nr="P854", is_reference=True) # 'reference URL' Pid = P854
        return [refStatedIn, refRetrieved, refURL]

In [66]:
### Create empty columns for output file (ignore warnings)

subsetdf['Status'] = "pending" # "Status" column with 'pending' for all cells: 'error', 'complete', 'skipped' (meaning previously logged within 180 days)
subsetdf['Definitive'] = "" # Empty cell to be replaced with 'yes' or 'no' string
subsetdf['Gene QID'] = "" # To be replaced with 'absent' or 'multiple'
subsetdf['Disease QID'] = "" # To be replaced with 'absent' or 'multiple'

subsetdf

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the ca

Unnamed: 0,Gene,HGNC Gene ID,Disease,MONDO Disease ID,SOP,Classification,Report Reference URL,Report Date,Status,Definitive,Gene QID,Disease QID
25,ADCY1,HGNC:232,autosomal recessive nonsyndromic deafness,MONDO_0019588,SOP4,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2017-05-10T00:00:00,pending,,,
26,ADGRV1,HGNC:17416,Usher syndrome type 2C,MONDO_0011558,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-02-15T00:00:00,pending,,,
27,ADGRV1,HGNC:17416,nonsyndromic genetic deafness,MONDO_0019497,SOP6,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2019-03-19T16:00:00.000Z,pending,,,
28,AFF2,HGNC:3776,FRAXE intellectual disability,MONDO_0010659,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-10-20T00:00:00,pending,,,
29,AGPS,HGNC:327,rhizomelic chondrodysplasia punctata,MONDO_0015776,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-10-04T20:27:42.154Z,pending,,,
30,AGTR2,HGNC:338,non-syndromic X-linked intellectual disability,MONDO_0019181,SOP4,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2016-11-16T00:00:00,pending,,,
31,AIFM1,HGNC:8768,X-linked hereditary sensory and autonomic neur...,MONDO_0010378,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2018-07-09T16:00:00.000Z,pending,,,
32,AKAP9,HGNC:379,long QT syndrome 11,MONDO_0012738,SOP4,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2016-12-15T00:00:00,pending,,,
33,ALDH4A1,HGNC:406,hyperprolinemia type 2,MONDO_0009401,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-vali...,2019-09-13T17:27:15.767Z,pending,,,
34,ALDH7A1,HGNC:877,pyridoxine-dependent epilepsy,MONDO_0009945,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-07-26T17:09:04.469Z,pending,,,


In [67]:
### For loop that executes the following through each row of the dataframe 

start_time = time.time() # Keep track of how long it takes loop to run

for index, row in subsetdf.iterrows(): # Index is a row number, row is all variables and values for that row
        
    # Identify the string in the Gene or Disease column for a given row
    HGNC = subsetdf.loc[index, 'Gene'] 
    MONDO = subsetdf.loc[index, 'MONDO Disease ID'].replace("_", ":") # .replace() changes _ to : for SparQL query
    
    # SparQL query to search for Gene or Disease in Wikidata based on HGNC (P353) or MONDO (P5270)
    sparqlQuery_HGNC = "SELECT * WHERE {?gene wdt:P353 \""+HGNC+"\"}" 
    result_HGNC = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery_HGNC) # Resultant query
    sparqlQuery_MONDO = "SELECT * WHERE {?disease wdt:P5270 \""+MONDO+"\"}" 
    result_MONDO = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery_MONDO)
    
    # Assign resultant length of dictionary for either Gene or Disease (number of Qid)
    HGNC_qlength = len(result_HGNC["results"]["bindings"]) 
    MONDO_qlength = len(result_MONDO["results"]["bindings"])
    
    # Conditional utilizing that value for output table 
    # Accounts for scenarios where there's no Gene Qid but is a Disease Qid, etc
    if HGNC_qlength == 1:
        HGNC_qid = result_HGNC["results"]["bindings"][0]["gene"]["value"].replace("http://www.wikidata.org/entity/", "")
        subsetdf.at[index, 'Gene QID'] = HGNC_qid # Input HGNC Qid in 'Gene QID' cell  
    if HGNC_qlength < 1: # If no Qid
        subsetdf.at[index, 'Status'] = "error" 
        subsetdf.at[index, 'Gene QID'] = "absent"  
    if HGNC_qlength > 1: # If multiple Qid
        subsetdf.at[index, 'Status'] = "error" 
        subsetdf.at[index, 'Gene QID'] = "multiple"
        
    if MONDO_qlength == 1:
        MONDO_qid = result_MONDO["results"]["bindings"][0]["disease"]["value"].replace("http://www.wikidata.org/entity/", "") 
        subsetdf.at[index, 'Disease QID'] = MONDO_qid  
    if MONDO_qlength < 1: 
        subsetdf.at[index, 'Status'] = "error" 
        subsetdf.at[index, 'Disease QID'] = "absent" 
    if MONDO_qlength > 1:
        subsetdf.at[index, 'Status'] = "error" 
        subsetdf.at[index, 'Disease QID'] = "multiple" 
        
    # Conditional inputs error for rows where the Classification != 'Definitive', then skips
    if row['Classification']!='Definitive': # If the string is NOT 'Definitive' for the Classification column
        subsetdf.at[index, 'Status'] = "error" # Then input "error" in the Status column
        subsetdf.at[index, 'Definitive'] = "no" # And'no' for Definitive column
        continue 
    else: # Otherwise
        subsetdf.at[index, 'Definitive'] = "yes" # Input 'yes' for Definitive column go on to next step
  
    # Conditional tcontinues to write into WikiData if only 1 Qid for each + Definitive classification 
    if HGNC_qlength == 1 & MONDO_qlength == 1:
        
        # Call upon create_reference() function created   
        reference = create_reference() 
        
        # Add disease value to gene item page, and gene value to disease item page (symmetry)
        statement_HGNC = [wdi_core.WDItemID(value=MONDO_qid, prop_nr="P2293", references=[copy.deepcopy(reference)])] # Creates 'gene assocation' statement (P2293) whether or not it's already there, and includes the references
        wikidata_HGNCitem = wdi_core.WDItemEngine(wd_item_id=HGNC_qid, data=statement_HGNC, ref_handler=update_retrieved_if_new_multiple_refs, append_value=["P2293"])
        wikidata_HGNCitem.get_wd_json_representation() # Gives json structure that submitted to API, helpful for debugging 
    
        statement_MONDO = [wdi_core.WDItemID(value=HGNC_qid, prop_nr="P2293", references=[copy.deepcopy(reference)])] # Symmetry for disease item page
        wikidata_MONDOitem = wdi_core.WDItemEngine(wd_item_id=MONDO_qid, data=statement_MONDO, ref_handler=update_retrieved_if_new_multiple_refs, append_value=["P2293"])
        wikidata_MONDOitem.get_wd_json_representation()

    
        print(colored(HGNC,"blue"), "Gene successfully logged as", colored(wikidata_HGNCitem.write(login),"blue"), "and", colored(MONDO,"green"), "Disease successfully logged as", colored(wikidata_MONDOitem.write(login),"green"))
        subsetdf.at[index, 'Status'] = "complete" 
        
end_time = time.time() # Captures when loop run ends
print("The total time of this loop is:", end_time - start_time, "seconds, or", (end_time - start_time)/60, "minutes")

# Write output to a .csv file
now = datetime.now() # Retrieves current time and saves it as 'now'
# Includes hour:minute:second_dd-mm-yyyy time stamp (https://en.wikipedia.org/wiki/ISO_8601)
subsetdf.to_csv("ClinGenBot_Status-Output_" + now.isoformat() + ".csv")  # isoformat
subsetdf

[34mADGRV1[0m Gene successfully logged as [34mQ18047368[0m and [32mMONDO:0011558[0m Disease successfully logged as [32mQ32143643[0m
[34mAFF2[0m Gene successfully logged as [34mQ17928899[0m and [32mMONDO:0010659[0m Disease successfully logged as [32mQ21051307[0m
[34mALDH7A1[0m Gene successfully logged as [34mQ17833695[0m and [32mMONDO:0009945[0m Disease successfully logged as [32mQ7263591[0m
The total time of this loop is: 15.587664365768433 seconds, or 0.2597944060961405 minutes


Unnamed: 0,Gene,HGNC Gene ID,Disease,MONDO Disease ID,SOP,Classification,Report Reference URL,Report Date,Status,Definitive,Gene QID,Disease QID
25,ADCY1,HGNC:232,autosomal recessive nonsyndromic deafness,MONDO_0019588,SOP4,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2017-05-10T00:00:00,error,no,Q17709600,absent
26,ADGRV1,HGNC:17416,Usher syndrome type 2C,MONDO_0011558,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-02-15T00:00:00,complete,yes,Q18047368,Q32143643
27,ADGRV1,HGNC:17416,nonsyndromic genetic deafness,MONDO_0019497,SOP6,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2019-03-19T16:00:00.000Z,error,no,Q18047368,Q9079046
28,AFF2,HGNC:3776,FRAXE intellectual disability,MONDO_0010659,SOP4,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2017-10-20T00:00:00,complete,yes,Q17928899,Q21051307
29,AGPS,HGNC:327,rhizomelic chondrodysplasia punctata,MONDO_0015776,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-10-04T20:27:42.154Z,error,yes,Q18033087,absent
30,AGTR2,HGNC:338,non-syndromic X-linked intellectual disability,MONDO_0019181,SOP4,Disputed,https://search.clinicalgenome.org/kb/gene-vali...,2016-11-16T00:00:00,error,no,Q17816212,absent
31,AIFM1,HGNC:8768,X-linked hereditary sensory and autonomic neur...,MONDO_0010378,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2018-07-09T16:00:00.000Z,error,yes,Q18033855,absent
32,AKAP9,HGNC:379,long QT syndrome 11,MONDO_0012738,SOP4,Limited,https://search.clinicalgenome.org/kb/gene-vali...,2016-12-15T00:00:00,error,no,Q18035065,Q32139811
33,ALDH4A1,HGNC:406,hyperprolinemia type 2,MONDO_0009401,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-vali...,2019-09-13T17:27:15.767Z,error,no,Q18033190,absent
34,ALDH7A1,HGNC:877,pyridoxine-dependent epilepsy,MONDO_0009945,SOP6,Definitive,https://search.clinicalgenome.org/kb/gene-vali...,2019-07-26T17:09:04.469Z,complete,yes,Q17833695,Q7263591
