### MeSH Based Query Search

In this notebook, we will discuss how to search documents based on the Disease described in the CVD tree.

In [1]:
import pandas as pd
import json
from neo4j import GraphDatabase
import csv

#### Authentication to access covidgraph.org graph

In [2]:
covid_browser = "https://covid.petesis.com:7473"
covid_url = "bolt://covid.petesis.com:7687"
user = "public"
password = "corona"

#driver = GraphDatabase.driver(uri, auth=(user, password))
driver = GraphDatabase.driver(uri = covid_url,\
                              auth = (user,password))

#### MeSH descriptor to its entity list
- Ex. ```C01.925.782.600.550.200.360: [feline infectious peritonitis]```
- Pandas Dataframe is very convenient for handeling a CSV file specifically for data transformation with ```lambda``` mapping functon.

In [3]:
MeSH = pd.read_csv("input/mesh/corona.csv")
MeSH = MeSH.set_index('ID')
MeSH.head()

Unnamed: 0_level_0,name
ID,Unnamed: 1_level_1
C01.925.782.600.550.200,Coronavirus Infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome


- Implementing ```lambda``` function to map one column to another column

In [4]:
MeSH['phrases'] = MeSH['name'].apply(lambda x: x.lower().strip())

In [5]:
MeSH.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,coronavirus infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","enteritis, transmissible, of turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,feline infectious peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","gastroenteritis, transmissible, of swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,severe acute respiratory syndrome


In [6]:
MeSH['phrases'] = MeSH['phrases'].apply(lambda x:x.split(','))

In [7]:
MeSH.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,[coronavirus infections]
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","[enteritis, transmissible, of turkeys]"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,[feline infectious peritonitis]
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","[gastroenteritis, transmissible, of swine]"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,[severe acute respiratory syndrome]


In [8]:
MeSH.index[0]

'C01.925.782.600.550.200'

#### MeSH to Doc Mapping
- Create a dictionary where the key is a MeSH descriptor, and the value is a list of papers (publications) that contains mention of the MeSH terms in its body text
- Each paper is represented as dictionary linking each attribute name in the paper (cord_uid, journal, title, etc.) with its actual information

##### Example of a paper node in the covid graph

In [9]:
paper_query = "MATCH (n:Paper) RETURN n LIMIT 1"
Data = []
with driver.session() as session:
    info = session.run(paper_query)
    for item in info:
        print(item)

<Record n=<Node id=3198 labels={'Paper'} properties={'cord_uid': 'zrmkq3mz', 'cord19-fulltext_hash': '41c7a01f11ed47591d99f45774e43e45aeba0619', 'journal': 'BMC Microbiol', 'publish_time': '2009-08-12', 'source': 'PMC', 'title': 'CAPIH: A Web interface for comparative analyses and visualization of host-HIV protein-protein interactions', '_hash_id': '3c4b2ee1430dc9ac53aca87c0fc0f7eb', 'url': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2782265/'}>>


#### Writes data to file

In [10]:
ff = open("data/mesh_search/mesh_search_corona_queries.json", 'w')

In [11]:
MeSH_to_result = {}
for desc, entities  in zip(MeSH.index, MeSH['phrases']):
    #Builds each part of the query based on the MeSH descriptor entity list
    query = "MATCH (p:Paper)-[:PAPER_HAS_BODYTEXTCOLLECTION]-(:BodyTextCollection)-" \
                                        "[:BODYTEXTCOLLECTION_HAS_BODYTEXT]-(a:BodyText) WHERE ("
    for i in range(len(entities)):
        if i == len(entities)-1:
            query += "LOWER(a.text) CONTAINS '" + entities[i] + "') RETURN DISTINCT p"
        else:
            query += "LOWER(a.text) CONTAINS '" + entities[i] + "' AND "

    MeSH_result = []
        
    with driver.session() as session:
        info = session.run(query)
        for item in info:
            try:
                node_keys = list((item.values(0)[0]).keys())
                node_values = list((item.values(0)[0]).values())
                paper = {}
                for i in range(len(node_keys)):
                    paper[node_keys[i]] = node_values[i]
                MeSH_result.append(paper)
            except:
                continue
    
    try:
        MeSH_to_result[desc] = MeSH_result
    except:
        continue
    
    json.dump(MeSH_to_result, ff)

ff.close()

#### Prints MeSH descriptor to publication list dictionary created above

In [12]:
MeSH_to_result.items()

