### MeSH Based Query Search

In this notebook, we will discuss how to search documents based on the Disease described in the CVD tree.

In [3]:
import pandas as pd
import json
from neo4j import GraphDatabase
import csv

#### Authentication to access covidgraph.org graph

In [4]:
covid_browser = "https://covid.petesis.com:7473"
covid_url = "bolt://covid.petesis.com:7687"
user = "public"
password = "corona"

#driver = GraphDatabase.driver(uri, auth=(user, password))
driver = GraphDatabase.driver(uri = covid_url,\
                              auth = (user,password))

#### MeSH descriptor to its entity list
- Ex. ```C01.925.782.600.550.200.360: [feline infectious peritonitis]```
- Pandas Dataframe is very convenient for handeling a CSV file specifically for data transformation with ```lambda``` mapping functon.

#### First obtain MeSH terms related to corona

In [5]:
MeSH_corona = pd.read_csv("input/mesh/corona.csv")
MeSH_corona = MeSH_corona.set_index('ID')
MeSH_corona.head()

Unnamed: 0_level_0,name
ID,Unnamed: 1_level_1
C01.925.782.600.550.200,Coronavirus Infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome


- Implementing ```lambda``` function to map one column to another column

In [6]:
MeSH_corona['phrases'] = MeSH_corona['name'].apply(lambda x: x.lower().strip())

In [7]:
MeSH_corona.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,coronavirus infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","enteritis, transmissible, of turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,feline infectious peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","gastroenteritis, transmissible, of swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,severe acute respiratory syndrome


In [8]:
MeSH_corona['phrases'] = MeSH_corona['phrases'].apply(lambda x:x.split(','))

In [9]:
MeSH_corona.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,[coronavirus infections]
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","[enteritis, transmissible, of turkeys]"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,[feline infectious peritonitis]
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","[gastroenteritis, transmissible, of swine]"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,[severe acute respiratory syndrome]


#### Obtaining terms related to heart disease

In [10]:
MeSH_heart = pd.read_csv("input/mesh/heart_disease.csv")
MeSH_heart = MeSH_heart.set_index('ID')
MeSH_heart.head()

Unnamed: 0_level_0,name
ID,Unnamed: 1_level_1
C14.280.647,Myocardial Ischemia
C14.280.647.124,Acute Coronary Syndrome
C14.280.647.187,Angina Pectoris
C14.280.647.187.150,"Angina, Unstable"
C14.280.647.187.150.150,"Angina Pectoris, Variant"


- Implementing ```lambda``` function to map one column to another column

In [11]:
MeSH_heart['phrases'] = MeSH_heart['name'].apply(lambda x: x.lower().strip())

In [12]:
MeSH_heart['phrases'] = MeSH_heart['phrases'].apply(lambda x:x.split(','))

In [13]:
MeSH_heart.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C14.280.647,Myocardial Ischemia,[myocardial ischemia]
C14.280.647.124,Acute Coronary Syndrome,[acute coronary syndrome]
C14.280.647.187,Angina Pectoris,[angina pectoris]
C14.280.647.187.150,"Angina, Unstable","[angina, unstable]"
C14.280.647.187.150.150,"Angina Pectoris, Variant","[angina pectoris, variant]"


#### Combine both value sets
- Corona virus mesh descriptions and heart disease mesh descriptions

In [14]:
all_ = [(x, y) for x in MeSH_corona['phrases'] for y in MeSH_heart['phrases']]

In [15]:
all_[0:4]

[(['coronavirus infections'], ['myocardial ischemia']),
 (['coronavirus infections'], ['acute coronary syndrome']),
 (['coronavirus infections'], ['angina pectoris']),
 (['coronavirus infections'], ['angina', ' unstable'])]

#### MeSH to Doc Mapping
- Create a dictionary where the key is a MeSH descriptor, and the value is a list of papers (publications) that contains mention of the MeSH terms in its body text
- Each paper is represented as dictionary linking each attribute name in the paper (cord_uid, journal, title, etc.) with its actual information

##### Example of a paper node in the covid graph

In [16]:
paper_query = "MATCH (n:Paper) RETURN n LIMIT 1"
Data = []
with driver.session() as session:
    info = session.run(paper_query)
    for item in info:
        print(item)

<Record n=<Node id=3198 labels={'Paper'} properties={'cord_uid': 'zrmkq3mz', 'cord19-fulltext_hash': '41c7a01f11ed47591d99f45774e43e45aeba0619', 'journal': 'BMC Microbiol', 'publish_time': '2009-08-12', 'source': 'PMC', 'title': 'CAPIH: A Web interface for comparative analyses and visualization of host-HIV protein-protein interactions', '_hash_id': '3c4b2ee1430dc9ac53aca87c0fc0f7eb', 'url': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2782265/'}>>


#### Writes data to file

In [17]:
ff = open("data/mesh_queries_corona_heartdisease.json", 'w')

In [None]:
for corona, heart in all_:
    #Builds each part of the query based on the MeSH descriptor entity list
    query = "MATCH (p:Paper)-[:PAPER_HAS_BODYTEXTCOLLECTION]-(:BodyTextCollection)-" \
                                        "[:BODYTEXTCOLLECTION_HAS_BODYTEXT]-(a:BodyText) WHERE ("
    for i in range(len(corona)):
        query += "LOWER(a.text) CONTAINS '" + corona[i] + "' AND "
        for j in range(len(heart)):
            if i == len(corona)-1 and j == len(heart)-1:
                query += "LOWER(a.text) CONTAINS '" + heart[j] + "') RETURN DISTINCT p"
            else:
                query += "LOWER(a.text) CONTAINS '" + heart[j] + "' AND "

    MeSH_result = []
        
    with driver.session() as session:
        info = session.run(query)
        for item in info:
            try:
                node_keys = list((item.values(0)[0]).keys())
                node_values = list((item.values(0)[0]).values())
                paper = {}
                for i in range(len(node_keys)):
                    paper[node_keys[i]] = node_values[i]
                MeSH_result.append(paper)
            except:
                continue
    
    if MeSH_result != []:
        print(MeSH_result)
        json.dump(MeSH_result, ff)

ff.close()