### COVID-19 Graph CVD ICD11 Corona Symptom Analysis

This notebook analyzes the list of ICD11 symptoms and descriptors in publications with their associations with coronavirus.

#### Authentication to access covidgraph.org graph

In [1]:
import pandas as pd
import json
from neo4j import GraphDatabase
# from neo4j import APOC

In [2]:
covid_browser = "https://covid.petesis.com:7473"
covid_url = "bolt://covid.petesis.com:7687"
user = "public"
password = "corona"

#driver = GraphDatabase.driver(uri, auth=(user, password))
driver = GraphDatabase.driver(uri = covid_url,\
                              auth = (user,password))

#### The queries below focus on symptoms and descriptor terms specified before it
- For each ICD11 code, a list of all its associated symptoms is created
- In a loop each name is queried into a dictionary with 5 main publication attributes (journal, publish time, source, title, and url)
- This dictionary is appended to a larger dictionary that maps each name to all of its associated papers
- This data is then written to a ```json``` file named by its ICD11 code

**Use Corona disease and symptoms from ICD 11 (e.g., 'BodyText' node in graph)**

#### ICD11 Code: XN83D

In [3]:
query = "MATCH (p:Paper)-[:PAPER_HAS_BODYTEXTCOLLECTION]-(:BodyTextCollection)\
                                -[:BODYTEXTCOLLECTION_HAS_BODYTEXT]-(a:BodyText) \
                                WHERE (LOWER(a.text) CONTAINS 'coronavirus') \
                                    return p LIMIT 1"
with driver.session() as session:
    info = session.run(query)
    for item in info:
        print(item)

<Record p=<Node id=66685 labels={'Paper'} properties={'cord_uid': 'imbxofkp', 'cord19-fulltext_hash': '276d1d1c20336ca2a6f54c7a95507001917e4c44', 'journal': 'Emerg Infect Dis', 'publish_time': '2005-01-10', 'source': 'PMC', 'title': 'Tracing SARS-Coronavirus Variant with Large Genomic Deletion', '_hash_id': '9986e9e7e5fd88596118e63d8adb8233', 'url': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3294368/'}>>


In [4]:
entities_xn = ['human coronavirus 229e','human coronavirus hku1', 'human coronavirus oc43', \
                'middle east respiratory syndrome coronavirus', 'pipistrellus bat coronavirus hku5', \
                'rousettus bat coronavirus hku9', 'severe acute respiratory syndrome coronavirus', \
                'tylonycteris bat coronavirus hku4']

In [5]:
result_xn = []
for entity in entities_xn:
    entity_result = []
    query = "MATCH (p:Paper)-[:PAPER_HAS_BODYTEXTCOLLECTION]-(:BodyTextCollection)-\
                                    [:BODYTEXTCOLLECTION_HAS_BODYTEXT]-(a:BodyText) \
                                    WHERE (LOWER(a.text) CONTAINS '" + entity + "')" + \
                                    "RETURN DISTINCT p.journal, p.publish_time, p.source, p.title, p.url"
    
    with driver.session() as session:
        info = session.run(query)
        for item in info:
            entity_result.append({'journal': item.values()[0], \
                                  "publish_time": item.values()[1],\
                                  "source": item.values()[2],\
                                  "title": item.values()[3],\
                                  "url": item.values()[4]})
            
    result_xn.append({entity:entity_result})

In [6]:
with open("output/coronavirus/ICD11-XN83D.json", 'w') as xn:
    json.dump(result_xn, xn)