### MeSH Based Query Search

In this notebook, we will discuss how to search documents based on the Disease described in the CVD tree.

In [1]:
import pandas as pd
import json
from neo4j import GraphDatabase
import csv

#### Authentication to access covidgraph.org graph

In [2]:
covid_browser = "https://covid.petesis.com:7473"
covid_url = "bolt://covid.petesis.com:7687"
user = "public"
password = "corona"

#driver = GraphDatabase.driver(uri, auth=(user, password))
driver = GraphDatabase.driver(uri = covid_url,\
                              auth = (user,password))

#### MeSH descriptor to its entity list
- Ex. ```C01.925.782.600.550.200.360: [feline infectious peritonitis]```
- Pandas Dataframe is very convenient for handeling a CSV file specifically for data transformation with ```lambda``` mapping functon.

#### First obtain MeSH terms related to corona

In [3]:
MeSH_corona = pd.read_csv("input/mesh/corona.csv")
MeSH_corona = MeSH_corona.set_index('ID')
MeSH_corona.head()

Unnamed: 0_level_0,name
ID,Unnamed: 1_level_1
C01.925.782.600.550.200,Coronavirus Infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome


- Implementing ```lambda``` function to map one column to another column

In [4]:
MeSH_corona['phrases'] = MeSH_corona['name'].apply(lambda x: x.lower().strip())

In [5]:
MeSH_corona.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,coronavirus infections
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","enteritis, transmissible, of turkeys"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,feline infectious peritonitis
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","gastroenteritis, transmissible, of swine"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,severe acute respiratory syndrome


In [6]:
MeSH_corona['phrases'] = MeSH_corona['phrases'].apply(lambda x:x.split(','))

In [7]:
MeSH_corona.head()

Unnamed: 0_level_0,name,phrases
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
C01.925.782.600.550.200,Coronavirus Infections,[coronavirus infections]
C01.925.782.600.550.200.325,"Enteritis, Transmissible, of Turkeys","[enteritis, transmissible, of turkeys]"
C01.925.782.600.550.200.360,Feline Infectious Peritonitis,[feline infectious peritonitis]
C01.925.782.600.550.200.400,"Gastroenteritis, Transmissible, of Swine","[gastroenteritis, transmissible, of swine]"
C01.925.782.600.550.200.750,Severe Acute Respiratory Syndrome,[severe acute respiratory syndrome]


#### Obtaining immune system pathway terms

In [8]:
MeSH_immune = pd.read_csv("input/pathways/immune_system_pathways.csv", index_col=0)
MeSH_immune = MeSH_immune.set_index('RID')
MeSH_immune.head()

Unnamed: 0_level_0,name,species
RID,Unnamed: 1_level_1,Unnamed: 2_level_1
R-HSA-174577,Activation of C3 and C5,Homo sapiens
R-HSA-1280218,Adaptive Immune System,Homo sapiens
R-HSA-879415,Advanced glycosylation endproduct receptor sig...,Homo sapiens
R-HSA-173736,Alternative complement activation,Homo sapiens
R-HSA-983170,"Antigen Presentation: Folding, assembly and pe...",Homo sapiens


In [9]:
MeSH_immune['name'] = MeSH_immune['name'].apply(lambda x: x.lower().strip())
MeSH_immune = MeSH_immune.drop(columns='species')
MeSH_immune.head()

Unnamed: 0_level_0,name
RID,Unnamed: 1_level_1
R-HSA-174577,activation of c3 and c5
R-HSA-1280218,adaptive immune system
R-HSA-879415,advanced glycosylation endproduct receptor sig...
R-HSA-173736,alternative complement activation
R-HSA-983170,"antigen presentation: folding, assembly and pe..."


In [10]:
MeSH_immune['name'] = MeSH_immune['name'].apply(lambda x: x.split(':')[0].strip())
MeSH_immune['name'] = MeSH_immune['name'].apply(lambda x: x.split('&'))

In [11]:
for val_list in MeSH_immune['name'].values:
    for val in val_list:
        val.strip()
        if '(' in val:
            open_in = val.find('(')
            close_in = val.find(')')
            val = val[0:open_in].strip() + ' ' + val[close_in+1:len(val)].strip()
            val_list.append(val[open_in:close_in].strip())

In [12]:
MeSH_immune.head()

Unnamed: 0_level_0,name
RID,Unnamed: 1_level_1
R-HSA-174577,[activation of c3 and c5]
R-HSA-1280218,[adaptive immune system]
R-HSA-879415,[advanced glycosylation endproduct receptor si...
R-HSA-173736,[alternative complement activation]
R-HSA-983170,[antigen presentation]


#### Combine value sets (corona mesh descriptions and immune pathways)

In [13]:
all_ = [(x, y) for x in MeSH_corona['phrases'] for y in MeSH_immune['name']]

In [14]:
all_[0:4]

[(['coronavirus infections'], ['activation of c3 and c5']),
 (['coronavirus infections'], ['adaptive immune system']),
 (['coronavirus infections'],
  ['advanced glycosylation endproduct receptor signaling']),
 (['coronavirus infections'], ['alternative complement activation'])]

#### MeSH to Doc Mapping
- Create a dictionary where the key is a MeSH descriptor, and the value is a list of papers (publications) that contains mention of the MeSH terms in its body text
- Each paper is represented as dictionary linking each attribute name in the paper (cord_uid, journal, title, etc.) with its actual information

##### Example of a paper node in the covid graph

In [15]:
paper_query = "MATCH (n:Paper) RETURN n LIMIT 1"
Data = []
with driver.session() as session:
    info = session.run(paper_query)
    for item in info:
        print(item)

<Record n=<Node id=3198 labels={'Paper'} properties={'cord_uid': 'zrmkq3mz', 'cord19-fulltext_hash': '41c7a01f11ed47591d99f45774e43e45aeba0619', 'journal': 'BMC Microbiol', 'publish_time': '2009-08-12', 'source': 'PMC', 'title': 'CAPIH: A Web interface for comparative analyses and visualization of host-HIV protein-protein interactions', '_hash_id': '3c4b2ee1430dc9ac53aca87c0fc0f7eb', 'url': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2782265/'}>>


#### Writes data to file

In [20]:
ff = open("data/mesh_queries_corona_immune.json", 'w')

In [21]:
for corona, immune in all_:
    #Builds each part of the query based on the MeSH descriptor entity list
    query = "MATCH (p:Paper)-[:PAPER_HAS_BODYTEXTCOLLECTION]-(:BodyTextCollection)-" \
                                        "[:BODYTEXTCOLLECTION_HAS_BODYTEXT]-(a:BodyText) WHERE ("
    for i in range(len(corona)):
        query += "LOWER(a.text) CONTAINS '" + corona[i] + "' AND "
        for j in range(len(immune)):
            if i == len(corona)-1 and j == len(immune)-1 :
                query += "LOWER(a.text) CONTAINS '" + immune[j] + "') RETURN DISTINCT p"
            else:
                query += "LOWER(a.text) CONTAINS '" + immune[j] + "' AND "

    MeSH_result = []
    
    with driver.session() as session:
        info = session.run(query)
        for item in info:
            try:
                node_keys = list((item.values(0)[0]).keys())
                node_values = list((item.values(0)[0]).values())
                paper = {}
                for i in range(len(node_keys)):
                    paper[node_keys[i]] = node_values[i]
                MeSH_result.append(paper)
            except:
                continue
    
    if MeSH_result != []:
        json.dump(MeSH_result, ff)

ff.close()

In [22]:
f2 = open("data/mesh_text_corona_immune.json", 'w')

In [23]:
for corona, immune in all_:
    #Builds each part of the query based on the MeSH descriptor entity list
    query = "MATCH (a:BodyText) WHERE ("
    for i in range(len(corona)):
        query += "LOWER(a.text) CONTAINS '" + corona[i] + "' AND "
        for j in range(len(immune)):
            if i == len(corona)-1 and j == len(immune)-1 :
                query += "LOWER(a.text) CONTAINS '" + immune[j] + "') RETURN DISTINCT a"
            else:
                query += "LOWER(a.text) CONTAINS '" + immune[j] + "' AND "

    MeSH_result = []
    
    with driver.session() as session:
        info = session.run(query)
        for item in info:
            try:
                node_keys = list((item.values(0)[0]).keys())
                node_values = list((item.values(0)[0]).values())
                paper = {}
                for i in range(len(node_keys)):
                    if (node_keys[i] == 'text'):
                        paper[node_keys[i]] = node_values[i]
                MeSH_result.append(paper)
            except:
                continue
    
    if MeSH_result != []:
        print(MeSH_result)
        json.dump(MeSH_result, f2)

f2.close()

[{'text': 'The majority of studies addressing T cell responses to respiratory virus infections come from mice infected with a variety of natural and mouse-adapted pathogens. A few studies use natural mouse pathogens such as Sendai virus, a mouse para-influenza type I pathogen and mouse hepatitis virus-1 (MHV-1). More commonly, mouse-adapted stains of human pathogens such as the A/Puerto Rico/8/1934 H1N1 (PR8) or A/WSN/33 H1N1 (WSN) strains of influenza and SARS-CoV-MA15 have been used to study innate and adaptive immune responses [25, 30] . Initiation of the immune response against invading pathogens begins with direct infection of airway epithelium. Following initial infection, lung-resident respiratory dendritic cells (rDCs) acquire the invading pathogen or antigens from infected epithelial cells, become activated, process antigen and migrate to the draining (mediastinal and cervical) lymph nodes (DLN). Once in the DLNs, rDCs present the processed antigen in the form of MHC/peptide c

[{'text': 'The pulmonary complications in human CoViD-19 patients are due to an exuberant local inflammatory response with diffuse alveolar damage. Patients dying because of SARS have lung consolidation, edema and mucopurulent material in the bronchial tree. At microscopic examination, alterations such as diffuse alveolar damage, hyaline membrane and fibrin formation, neutrophils and macrophages infiltrates were detected in the interstitium and alveoli. Similar features were noted in the only human autopsy report available of MERS infection. Cytokines and chemokines play a key role in the immune response against viral infections, and their altered production has been demonstrated in both SARS and MERS coronavirus infections. Such altered levels have been shown to be likely due to the low synthesis of antiviral cytokines such as interferons (IFN)α or β and in concert increased levels of other pro-inflammatory cytokines/chemokines that have pathogenic consequences. Among them, interleuki

[{'text': 'Aminopeptidase N (APN) is a widely expressed membrane-bound exopeptidase that belongs to a group of zinc-containing metalloproteases that include the consensus catalytic motif HEXXH [12] . APN can activate or inactivate bioactive peptides on the cell surface, and cause cytokine and extracellular matrix degradation to show enzymatic activity. APN plays a role in inflammatory and immunological responses, antigen processing, tumor invasion, and cell-cell contact. Moreover, APN modulates signals in monocytes [13] and is a receptor for several coronaviruses, i.e., canine coronavirus, feline infectious peritonitis virus, and transmissible gastroenteritis virus (TGEV) [14, 15] . Importantly, APN is also a direct receptor for ETEC F4 fimbriae and is associated with the induction of mucosal immunity [16] . FaeG from all three variants directly mediate the fimbrial binding of F4 + E. coli to host intestinal epithelial cells by binding to APN, while also modulating APN expression in IP

[{'text': 'One of the strategies to enhance DNA vaccine potency uses intracellular targeting strategies to enhance major histocompatibility complex (MHC) class I-II antigen presentation and processing in DCs. Previously, we have studied the linkage of calreticulin (CRT), a Ca 2+ -binding protein located in the endoplasmic reticulum (for review, see [5] ) to several antigens, including human papilloma virus type-16 (HPV-16) E7 [6, 7] , E6 [8] , and nucleocapsid protein of severe acute respiratory syndrome coronavirus [9] . Intradermal administration of CRT linked to any of these target antigens led to a significant increase in the antigen-specific CD8+ T cell (CTL) immune responses and impressive antitumor effects. Thus, CRT has been shown to be highly potent in enhancing the antigen-specific immune responses and antitumor effects generated by DNA vaccination in several preclinical models.'}, {'text': 'The concept of population genomics also applies to epidemiological studies of outbrea

[{'text': 'To determine whether the genomic regions with large deviations in ancestry are linked with human diseases and biological pathways, we applied enrichment analysis to the 341 unique genes mapping to the regions with significant evidence of EUR-, NAF-, or SSA-related ancestry deviations ( fig. 8 and supplementary table 1 , Supplementary Material online). The top annotations were dominated by skin, vascular, renal, autoimmune, and neuropsychiatric diseases as well as by DNA metabolism, amyloids, meiosis, and transcription pathways. In addition, many prevalent diseases, such as diabetes, asthma, and allergy, and infectious diseases as well as some severe conditions, such as oncologic and severe acute respiratory syndrome, were also significantly enriched (q value < 0.05). Regulation of inflammatory response, the complement pathway, telomere maintenance, and antigen processing and presentation were among the pathways significantly enriched (q value < 0.05) but ranked lower in the 

[{'text': 'VP24 and VP35 act as transcription activators [25] . The former perturbs interferon signaling and latter is an interferon antagonist, thus together they are capable of blocking production of interferons via STAT1 inhibition [25, 26] . VP40 is the matrix protein, which mediates virus-like particle budding [27] . Glycoprotein is the virulence factor that can be liberated or anchored to membrane [6] . These conjugated proteins are secreted into host extracellular space, in diverse truncated isoforms [28] . Full-length glycoproteins measure 150-170-kDa, and they are inserted into the viral membrane, through transcriptional editing [29] . These trimeric proteins with O-linked oligomannose glycans adhere to host cells and mediate fusion with host membrane [6] . Attachment to the endothelial cells via Niemann-Pick C1 receptors (C-type lectin membrane proteins) is followed by replication of the virus [30] . Antigenpresenting cells (APCs) like macrophages and dendritic cells are targ

[{'text': 'Adenoviruses (Ad) possess several attributes that make them suitable candidates for vaccine vectors [1, 2] . Ad exert an adjuvantlike effect by stimulating the innate immune system through both Toll-like receptor (TLR)-dependent and TLR-independent pathways [3, 4] . The effectiveness of Ad vector-based vaccines against many infectious diseases, including measles, severe acute respiratory syndrome (SARS), human immunodeficiency virus (HIV), hepatitis B and Ebola has been evaluated in animal models and clinical trials in humans [5] [6] [7] [8] [9] . Previously, we and others have explored the potential of a human Ad serotype 5 (HAd5) vectorbased vaccine strategy for H5N1 influenza [10] [11] [12] . Our immunogenicity and protective efficacy studies demonstrated that Ad vector-based vaccines provide complete protection against challenge with homologous and antigenically distinct strains of influenza viruses in a mouse model [11] .'}, {'text': 'HP is a plasma a2-glycoprotein that

[{'text': 'Because DNA vaccines induce long-lasting humoral and cellular immunity, they are a powerful tool in the fight against infectious diseases. Genetic vaccinations comprise eukaryot-ic expression plasmids that are inoculated into target cells, which then translate them and express the antigens. The efficacy of this technique correlates with the inflammation induced in muscle cells at the site of DNA vaccination, which causes the release of "danger signals" that induce local inflammatory responses and recruit immune cells to the site of vaccination. In this context, co-administration of plasmid DNA along with adjuvant-like cytokine genes, liposomes, or hyaluronidase, substantially improves the immunogenicity of the DNA vaccine [44] [45] [46] [47] [48] . The protective immunity conferred by DNA vaccines has been illustrated using animal models, including those infected by severe acute respiratory syndrome, influenza virus, or human immunodeficiency virus (HIV) [49] [50] [51] . DNA