### MeSH Query Search Setup

In this notebook, we will set up the system necessarily to be able to search documents based on the Disease described in the CVD tree.

In [1]:
import json
import pandas as pd
from collections import Counter

#### MeSH ID Dictionary Creation
Using the most updated (2020) set of MeSH descriptors, two dictionaries are created. One that links MeSH ID to the name and one that links name to MeSH ID

In [2]:
meshtree_file = "mtrees2020.bin"

In [3]:
Tree = []
id2name = {}
name2id = {}
with open(meshtree_file, "r") as ftree:
    for line in ftree:
        term_tree = line.strip().split(";")
        cur_term = term_tree[0]
        cur_tree = term_tree[1]

        id2name.update({cur_tree:cur_term})                        
        name2id.update({cur_term:cur_tree})
        Tree.append({'id':cur_tree ,'name':cur_term})

#### Cardiovascular Disease MeSH Descriptors
This code creates a data frame of all MeSH descriptors associated with cardiovascular disease

In [4]:
CVDTree = []
for name,ID in name2id.items():
    if ID[0:3] == 'C14':
            CVDTree.append({"name": name, "ID":ID})

In [5]:
len(list(CVDTree))

204

In [6]:
CVD = pd.DataFrame(CVDTree)
CVD = CVD.set_index('name')
CVD = CVD.sort_values("ID",ascending =True)

In [7]:
CVD.head()

Unnamed: 0_level_0,ID
name,Unnamed: 1_level_1
Cardiovascular Diseases,C14
Cardiovascular Infections,C14.260
"Syphilis, Cardiovascular",C14.260.500
"Tuberculosis, Cardiovascular",C14.260.750
Heart Diseases,C14.280


- Outputs all associated MeSH IDs to ```csv``` file

In [8]:
CVD.to_csv("cvd.csv")

#### Coronavirus Disease MeSH Descriptors
This code creates a data frame of all MeSH descriptors associated with coronavirus

In [9]:
CoronaTree = []
for ID,name in id2name.items():
    if ID[0:23] == 'C01.925.782.600.550.200':
        CoronaTree.append({"name": name, "ID":ID})

In [10]:
len(list(CoronaTree))

5

In [11]:
Corona = pd.DataFrame(CoronaTree)
Corona = Corona.set_index('name')
Corona = Corona.sort_values("ID",ascending =True)

In [12]:
Corona.head()

Unnamed: 0_level_0,ID
name,Unnamed: 1_level_1
Coronavirus Infections,C01.925.782.600.550.200
"Enteritis, Transmissible, of Turkeys",C01.925.782.600.550.200.325
Feline Infectious Peritonitis,C01.925.782.600.550.200.360
"Gastroenteritis, Transmissible, of Swine",C01.925.782.600.550.200.400
Severe Acute Respiratory Syndrome,C01.925.782.600.550.200.750


In [13]:
Corona.to_csv("corona.csv")

- Outputs all associated MeSH IDs to ```csv``` file

#### Specific Heart Disease MeSH Descriptors
This code creates a data frame of all MeSH descriptors associated with specific type of cardiovascular disease

In [14]:
HeartDiseaseTree = []
for ID,name in id2name.items():
    if ID[0:11] == 'C14.280.647':
        HeartDiseaseTree.append({"name": name, "ID":ID})

In [15]:
len(list(HeartDiseaseTree))

24

In [16]:
HeartDisease = pd.DataFrame(HeartDiseaseTree)
HeartDisease = HeartDisease.set_index('name')
HeartDisease = HeartDisease.sort_values("ID",ascending=True)

In [17]:
HeartDisease.head()

Unnamed: 0_level_0,ID
name,Unnamed: 1_level_1
Myocardial Ischemia,C14.280.647
Acute Coronary Syndrome,C14.280.647.124
Angina Pectoris,C14.280.647.187
"Angina, Unstable",C14.280.647.187.150
"Angina Pectoris, Variant",C14.280.647.187.150.150


In [18]:
HeartDisease.to_csv("heart_disease.csv")

- Outputs all associated MeSH IDs to ```csv``` file