### MeSH Query Search Setup

In this notebook, we will set up the system necessarily to be able to search documents based on the Disease described in the CVD tree.

In [None]:
import sys
!{sys.executable} -m pip install xlrd

In [1]:
import json
import pandas as pd
from collections import Counter

In [None]:
import sys
!{sys.executable} -m pip install pandas==0.23.3

In [2]:
pd.__version__

'0.23.3'

#### MeSH ID Dictionary Creation
Using the most updated (2020) set of MeSH descriptors, two dictionaries are created. One that links MeSH ID to the name and one that links name to MeSH ID

In [3]:
meshtree_file = "mtrees2020.bin"

In [4]:
Tree = []
id2name = {}
name2id = {}
with open(meshtree_file, "r") as ftree:
    for line in ftree:
        term_tree = line.strip().split(";")
        cur_term = term_tree[0]
        cur_tree = term_tree[1]

        id2name.update({cur_tree:cur_term})                        
        name2id.update({cur_term:cur_tree})
        Tree.append({'id':cur_tree ,'name':cur_term})

#### Signs and Symptoms MeSH Descriptors
This code creates a data frame of all MeSH descriptors associated with coronavirus signs and symptoms

In [23]:
signs_data = pd.read_excel(r'Covid Symptoms and Comorbidities updated.xlsx', sheet_name=0)
signs_df = pd.DataFrame(signs_data, columns= ['MeSH Header', 'MeSH tree'])

In [24]:
signs_df.head(5)

Unnamed: 0,MeSH Header,MeSH tree
0,Fever,C23.888.119.344
1,Chills,C23.888.208
2,Cough,C08.618.248\nC23.888.852.293
3,Fatigue,C23.888.369
4,Dyspnea,C08.618.326\nC23.888.852.371


In [25]:
signs_df = signs_df.dropna()
mesh_signs = signs_df['MeSH tree'].values.tolist()

In [26]:
mesh_signs[0]

'C23.888.119.344'

In [27]:
Signs = []
for mesh in mesh_signs:
    if ('\n' in mesh):
        mesh_new = mesh.split('\n')
        for val in mesh_new:
            Signs.append({"name": id2name[val], "ID":val})
    else:
        Signs.append({"name": id2name[mesh], "ID":mesh})

In [28]:
len(list(Signs))

49

In [29]:
Sign_Symptom = pd.DataFrame(Signs)
Sign_Symptom = Sign_Symptom.set_index('name')
Sign_Symptom = Sign_Symptom.sort_values("ID",ascending =True)

In [30]:
Sign_Symptom.head()

Unnamed: 0_level_0,ID
name,Unnamed: 1_level_1
Pharyngitis,C01.748.561
"Shock, Septic",C01.757.800
Myalgia,C05.651.542
Liver Failure,C06.552.308.500
Pharyngitis,C07.550.781


- Outputs all associated MeSH IDs to ```csv``` file

In [31]:
Sign_Symptom.to_csv("signs.csv")

#### Comorbidities MeSH Descriptors
This code creates a data frame of all MeSH descriptors associated with coronavirus comorbidities

In [35]:
comorb_data = pd.read_excel(r'Covid Symptoms and Comorbidities updated.xlsx', sheet_name=2)
comorb_df = pd.DataFrame(comorb_data, columns= ['MeSH Heading', 'MeSH Tree'])

In [36]:
comorb_df.head(5)

Unnamed: 0,MeSH Heading,MeSH Tree
0,,
1,Heart Failure,C14.280.434
2,Coronary Artery Disease,C14.280.647.250.260\nC14.907.137.126.339\nC14....
3,Cardiomyopathies,C14.280.238
4,Neoplasms,C04


In [38]:
comorb_df = comorb_df.dropna()
mesh_comorb = comorb_df['MeSH Tree'].values.tolist()
mesh_comorb[0]

'C14.280.434'

In [39]:
Comorbidities = []
for mesh in mesh_comorb:
    if ('\n' in mesh):
        mesh_new = mesh.split('\n')
        for val in mesh_new:
            Comorbidities.append({"name": id2name[val], "ID":val})
    else:
        Comorbidities.append({"name": id2name[mesh], "ID":mesh})

In [40]:
len(list(Comorbidities))

50

In [41]:
Comorb = pd.DataFrame(Comorbidities)
Comorb = Comorb.set_index('name')
Comorb = Comorb.sort_values("ID",ascending =True)

In [42]:
Comorb.head()

Unnamed: 0_level_0,ID
name,Unnamed: 1_level_1
HIV,B04.820.650.589.650.350
Acquired Immunodeficiency Syndrome,C01.778.640.400.040
Acquired Immunodeficiency Syndrome,C01.925.782.815.616.400.040
Acquired Immunodeficiency Syndrome,C01.925.813.400.040
Acquired Immunodeficiency Syndrome,C01.925.839.040


In [43]:
Comorb.to_csv("comorbidities.csv")

- Outputs all associated MeSH IDs to ```csv``` file