### MeSH Based Query Search

In this notebook, we will discuss how to search documents based on the Disease described in the CVD tree.

In [1]:
import pandas as pd
import json
from neo4j import GraphDatabase
import csv

#### Authentication to access covidgraph.org graph

In [2]:
covid_browser = "https://covid.petesis.com:7473"
covid_url = "bolt://covid.petesis.com:7687"
user = "public"
password = "corona"

#driver = GraphDatabase.driver(uri, auth=(user, password))
driver = GraphDatabase.driver(uri = covid_url,\
                              auth = (user,password))

#### MeSH descriptor to its entity list
- Ex. ```C01.925.782.600.550.200.360: [feline infectious peritonitis]```
- Pandas Dataframe is very convenient for handeling a CSV file specifically for data transformation with ```lambda``` mapping functon.

#### First obtain MeSH terms related to corona

In [3]:
MeSH_corona = pd.read_csv("input/mesh/corona.csv")
MeSH_corona = MeSH_corona.set_index('ID')
MeSH_corona.head()

Unnamed: 0_level_0,name
ID,Unnamed: 1_level_1
C01.925.782.600.550.200,Coronavirus Infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome


- Implementing ```lambda``` function to map one column to another column

In [4]:
MeSH_corona['phrases'] = MeSH_corona['name'].apply(lambda x: x.lower().strip())

In [5]:
MeSH_corona.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,coronavirus infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","enteritis, transmissible, of turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,feline infectious peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","gastroenteritis, transmissible, of swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,severe acute respiratory syndrome


In [6]:
MeSH_corona['phrases'] = MeSH_corona['phrases'].apply(lambda x:x.split(','))

In [7]:
MeSH_corona.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,[coronavirus infections]
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","[enteritis, transmissible, of turkeys]"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,[feline infectious peritonitis]
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","[gastroenteritis, transmissible, of swine]"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,[severe acute respiratory syndrome]


#### Obtaining immune system pathway terms

In [8]:
MeSH_immune = pd.read_csv("input/pathways/immune_system_pathways.csv", index_col=0)
MeSH_immune = MeSH_immune.set_index('RID')
MeSH_immune.head()

Unnamed: 0_level_0,name,species
RID,Unnamed: 1_level_1,Unnamed: 2_level_1
R-HSA-174577,Activation of C3 and C5,Homo sapiens
R-HSA-1280218,Adaptive Immune System,Homo sapiens
R-HSA-879415,Advanced glycosylation endproduct receptor sig...,Homo sapiens
R-HSA-173736,Alternative complement activation,Homo sapiens
R-HSA-983170,"Antigen Presentation: Folding, assembly and pe...",Homo sapiens


In [9]:
MeSH_immune['name'] = MeSH_immune['name'].apply(lambda x: x.lower().strip())
MeSH_immune = MeSH_immune.drop(columns='species')
MeSH_immune.head()

Unnamed: 0_level_0,name
RID,Unnamed: 1_level_1
R-HSA-174577,activation of c3 and c5
R-HSA-1280218,adaptive immune system
R-HSA-879415,advanced glycosylation endproduct receptor sig...
R-HSA-173736,alternative complement activation
R-HSA-983170,"antigen presentation: folding, assembly and pe..."


In [10]:
MeSH_immune['name'] = MeSH_immune['name'].apply(lambda x: x.split(':')[0].strip())
MeSH_immune['name'] = MeSH_immune['name'].apply(lambda x: x.split('&'))

In [11]:
for val_list in MeSH_immune['name'].values:
    for val in val_list:
        val.strip()
        if '(' in val:
            open_in = val.find('(')
            close_in = val.find(')')
            val = val[0:open_in].strip() + ' ' + val[close_in+1:len(val)].strip()
            val_list.append(val[open_in:close_in].strip())

In [12]:
MeSH_immune.head()

Unnamed: 0_level_0,name
RID,Unnamed: 1_level_1
R-HSA-174577,[activation of c3 and c5]
R-HSA-1280218,[adaptive immune system]
R-HSA-879415,[advanced glycosylation endproduct receptor si...
R-HSA-173736,[alternative complement activation]
R-HSA-983170,[antigen presentation]


#### Combine value sets (corona mesh descriptions and immune pathways)

In [13]:
all_ = [(x, y) for x in MeSH_corona['phrases'] for y in MeSH_immune['name']]

In [14]:
all_[0:4]

[(['coronavirus infections'], ['activation of c3 and c5']),
 (['coronavirus infections'], ['adaptive immune system']),
 (['coronavirus infections'],
  ['advanced glycosylation endproduct receptor signaling']),
 (['coronavirus infections'], ['alternative complement activation'])]

#### MeSH to Doc Mapping
- Create a dictionary where the key is a MeSH descriptor, and the value is a list of papers (publications) that contains mention of the MeSH terms in its body text
- Each paper is represented as dictionary linking each attribute name in the paper (cord_uid, journal, title, etc.) with its actual information

##### Example of a paper node in the covid graph

In [15]:
paper_query = "MATCH (n:Paper) RETURN n LIMIT 1"
Data = []
with driver.session() as session:
    info = session.run(paper_query)
    for item in info:
        print(item)

<Record n=<Node id=3198 labels={'Paper'} properties={'cord_uid': 'zrmkq3mz', 'cord19-fulltext_hash': '41c7a01f11ed47591d99f45774e43e45aeba0619', 'journal': 'BMC Microbiol', 'publish_time': '2009-08-12', 'source': 'PMC', 'title': 'CAPIH: A Web interface for comparative analyses and visualization of host-HIV protein-protein interactions', '_hash_id': '3c4b2ee1430dc9ac53aca87c0fc0f7eb', 'url': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2782265/'}>>


#### Writes data to file

In [22]:
ff = open("data/mesh_queries_corona_immune.json", 'w')

In [None]:
for corona, immune in all_:
    #Builds each part of the query based on the MeSH descriptor entity list
    query = "MATCH (p:Paper)-[:PAPER_HAS_BODYTEXTCOLLECTION]-(:BodyTextCollection)-" \
                                        "[:BODYTEXTCOLLECTION_HAS_BODYTEXT]-(a:BodyText) WHERE ("
    for i in range(len(corona)):
        query += "LOWER(a.text) CONTAINS '" + corona[i] + "' AND "
        for j in range(len(immune)):
            if i == len(corona)-1 and j == len(immune)-1 :
                query += "LOWER(a.text) CONTAINS '" + immune[j] + "') RETURN DISTINCT p"
            else:
                query += "LOWER(a.text) CONTAINS '" + immune[j] + "' AND "

    MeSH_result = []
    
    with driver.session() as session:
        info = session.run(query)
        for item in info:
            try:
                node_keys = list((item.values(0)[0]).keys())
                node_values = list((item.values(0)[0]).values())
                paper = {}
                for i in range(len(node_keys)):
                    paper[node_keys[i]] = node_values[i]
                MeSH_result.append(paper)
            except:
                continue
    
    if MeSH_result != []:
        json.dump(MeSH_result, ff)

ff.close()