## Part 1: Reading JSON file into a dataframe and saving/examining it.

In [1]:
import pandas as pd
import json

Minidata length and sample

In [2]:
df_mini = pd.read_json('data/english-train.json')
print("data length:", len(df_mini)) 
df_mini.head(n=5)

data length: 482


Unnamed: 0,description,utterances
0,throat a bit sore and want to get a good imune...,[patient: throat a bit sore and want to get a ...
1,"hey there i have had cold ""symptoms"" for over ...","[patient: hey there i have had cold ""symptoms""..."
2,i have a tight and painful chest with a dry co...,[patient: i have a tight and painful chest wit...
3,what will happen after the incubation period f...,[patient: what will happen after the incubatio...
4,suggest treatment for pneumonia,[patient: just found out i was pregnant. yeste...


Large data length and size

In [3]:
df_large = pd.read_json('data/full-train_data.json')
print("data length:", len(df_large)) 
df_large.head(n=5)

data length: 2725990


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,154,155,156,157,158,159,160,161,162,163
0,病人：强制性脊柱炎，晚上睡觉翻身时腰骶骨区域疼痛，其他身体任何部位均不疼痛。,医生：应该没有问题，但最好把图像上传看看。,,,,,,,,,...,,,,,,,,,,
1,"病人：先天性髋关节发育不良，半脱位.，右侧髋和膝疼痛不能随意行走.拍过x片,为先天性髋关节发...",医生：谢谢你的问题，门诊我们见到了，有事电话联系。,,,,,,,,,...,,,,,,,,,,
2,病人：无，5月11号在贵医院做了刮片检查，标本满意，细胞量>40%，有颈管细胞，无化生细胞，...,"医生：涂片结果基本正常,以后可1-2年一次,主要是早期筛查宫颈病变。",,,,,,,,,...,,,,,,,,,,
3,病人：盐酸缺乏症，自去年十一月份伤到肋骨住院半月回家后，由于花钱多，有很长时间没有挣到钱，一...,医生：到精神疾病专科医院就诊。,,,,,,,,,...,,,,,,,,,,
4,病人：门牙龋齿，本人在西安上学，我的门牙有一块龋齿了，在两颗门牙中间的牙缝处，面积有一颗芝麻...,医生：你好！你的问题治疗起来并不难，目前最好使用纳米树脂美容修复，大约费用每牙三百多元；后牙...,,,,,,,,,...,,,,,,,,,,


## Using the BioBert model on a single data sample

### analyzing the minidata to see how many conversations are there with two utterances (two parts)

First, we need to convert the utterances row to a list

In [4]:
utterance_count_dict = {}
for index, row in df_mini.iterrows():
    utterances = row["utterances"]
    utterance_count = len(utterances)#get the current utterance count 
    if utterance_count not in utterance_count_dict: 
        utterance_count_dict[utterance_count] = 1
    else:
        utterance_count_dict[utterance_count] = utterance_count_dict[utterance_count] + 1

printing the length of utterances:

In [5]:
print(utterance_count_dict)

{2: 480, 17: 1, 4: 1}


As we can see the vast majority of the dataset contains 2 way utterances

### isolating the utterance pairs (they always start with patient)

In [6]:
df_mini_pairs = df_mini[df_mini["utterances"].apply(len) == 2]
print("number of minipairs data:", len(df_mini_pairs))

number of minipairs data: 480


In [7]:
df_mini_pairs.head(n=1)

Unnamed: 0,description,utterances
0,throat a bit sore and want to get a good imune...,[patient: throat a bit sore and want to get a ...


Example reading a line by index

In [8]:
#
index = 0
print("patient text:", df_mini_pairs.iloc[index]["utterances"][0])
print("clinician text:",df_mini_pairs.iloc[index]["utterances"][1])

patient text: patient: throat a bit sore and want to get a good imune booster, especially in light of the virus. please advise. have not been in contact with nyone with the virus.
clinician text: doctor: during this pandemic. throat pain can be from a strep throat infection (antibiotics needed), a cold or influenza or other virus, or from some other cause such as allergies or irritants. usually, a person sees the doctor (call first) if the sore throat is bothersome, recurrent, or doesn't go away quickly. covid-19 infections tend to have cough, whereas strep throat usually lacks cough but has more throat pain. (3/21/20)


Saving the paired data

In [9]:
df_mini_pairs.to_csv("english-train-paired-conversations.csv")

### now, testing BioBert on this data 

We load a fine-tuned version of BioBert

In [10]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("d4data/biomedical-ner-all")
model = AutoModelForTokenClassification.from_pretrained("d4data/biomedical-ner-all")

pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # pass device=0 if using gpu
pipe("""The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.""")

  from .autonotebook import tqdm as notebook_tqdm
Device set to use cpu


[{'entity_group': 'Sign_symptom',
  'score': np.float32(0.9999311),
  'word': 'pal',
  'start': 38,
  'end': 41},
 {'entity_group': 'Sign_symptom',
  'score': np.float32(0.90633166),
  'word': '##pitations',
  'start': 41,
  'end': 50},
 {'entity_group': 'Clinical_event',
  'score': np.float32(0.99975544),
  'word': 'follow',
  'start': 54,
  'end': 60},
 {'entity_group': 'Date',
  'score': np.float32(0.999867),
  'word': '6 months after',
  'start': 64,
  'end': 78}]

Testing data on a single datapoint.

Split based on last : found

In [11]:
#
index = 0
patient = df_mini_pairs.iloc[index]["utterances"][0].replace("patient:","").strip() #raw patient string replacing doctor, patient
clinician = df_mini_pairs.iloc[index]["utterances"][1].replace("doctor:","").strip() #raw clinican stinrg
print(patient,"\n",clinician)

throat a bit sore and want to get a good imune booster, especially in light of the virus. please advise. have not been in contact with nyone with the virus. 
 during this pandemic. throat pain can be from a strep throat infection (antibiotics needed), a cold or influenza or other virus, or from some other cause such as allergies or irritants. usually, a person sees the doctor (call first) if the sore throat is bothersome, recurrent, or doesn't go away quickly. covid-19 infections tend to have cough, whereas strep throat usually lacks cough but has more throat pain. (3/21/20)


In [12]:
pipe(patient)

[{'entity_group': 'Biological_structure',
  'score': np.float32(0.9998447),
  'word': 'throat',
  'start': 0,
  'end': 6},
 {'entity_group': 'Sign_symptom',
  'score': np.float32(0.9993999),
  'word': 'sore',
  'start': 13,
  'end': 17}]

In [13]:
result = pipe(clinician)
print(result)

[{'entity_group': 'Biological_structure', 'score': np.float32(0.99981815), 'word': 'throat', 'start': 22, 'end': 28}, {'entity_group': 'Sign_symptom', 'score': np.float32(0.9998851), 'word': 'pain', 'start': 29, 'end': 33}, {'entity_group': 'Biological_structure', 'score': np.float32(0.999597), 'word': 'throat', 'start': 54, 'end': 60}, {'entity_group': 'Sign_symptom', 'score': np.float32(0.830349), 'word': 'sore', 'start': 240, 'end': 244}, {'entity_group': 'Biological_structure', 'score': np.float32(0.95489234), 'word': 'throat', 'start': 245, 'end': 251}, {'entity_group': 'Disease_disorder', 'score': np.float32(0.4631344), 'word': 'co', 'start': 306, 'end': 308}, {'entity_group': 'Disease_disorder', 'score': np.float32(0.57466906), 'word': '##vid - 19', 'start': 308, 'end': 314}, {'entity_group': 'Disease_disorder', 'score': np.float32(0.9928797), 'word': 'infections', 'start': 315, 'end': 325}, {'entity_group': 'Biological_structure', 'score': np.float32(0.8288955), 'word': 'st', '

In [14]:
result[0]["entity_group"]

'Biological_structure'

# Finding entities: Analyzing through the mini data using BioBert


Finding the entity_groups found for both patient and clinician data, and the count of mapped entities for each patient category

In [15]:
patient_entity_dict = {}
clinician_entity_dict = {}
for index,row in df_mini_pairs.iterrows():
    utterances = row["utterances"]
    
    try:  
        patient =  utterances[0].replace("patient:","").strip()
        patient_results = pipe(patient)
        #loop through the ntities
        for dict in patient_results:
            curr_entity = dict["entity_group"]
            if curr_entity not in patient_entity_dict:
                patient_entity_dict[curr_entity] = 1
            else: 
                patient_entity_dict[curr_entity] = patient_entity_dict[curr_entity] + 1
    except:
        print("no patient mapping at index:", index)
        pass 

    try: 
        clinician = utterances[1].replace("doctor:","").strip()
        clinician_results = pipe(clinician)
        for dict in clinician_results:
            curr_entity = dict["entity_group"]
            if curr_entity not in clinician_entity_dict:
                clinician_entity_dict[curr_entity] = 1
            else: 
                clinician_entity_dict[curr_entity] = clinician_entity_dict[curr_entity] + 1
    except:
        print("no clinican mapping at index:", index)
        pass 


Analyzing the set of patient entity

In [16]:
print(patient_entity_dict)

{'Biological_structure': 348, 'Sign_symptom': 937, 'Detailed_description': 326, 'Duration': 164, 'Severity': 86, 'Nonbiological_location': 191, 'Disease_disorder': 389, 'Coreference': 44, 'Clinical_event': 114, 'Subject': 137, 'Date': 134, 'Activity': 91, 'Medication': 470, 'Lab_value': 177, 'Outcome': 19, 'Frequency': 55, 'Age': 61, 'Dosage': 59, 'Diagnostic_procedure': 316, 'Time': 54, 'Texture': 4, 'History': 87, 'Therapeutic_procedure': 91, 'Sex': 6, 'Quantitative_concept': 11, 'Administration': 23, 'Other_event': 8, 'Occupation': 13, 'Distance': 5, 'Other_entity': 7, 'Color': 3, 'Family_history': 7, 'Personal_background': 3}


In [17]:
print(clinician_entity_dict)

{'Biological_structure': 260, 'Sign_symptom': 623, 'Disease_disorder': 481, 'Medication': 442, 'Detailed_description': 427, 'Date': 25, 'Clinical_event': 200, 'Lab_value': 176, 'Duration': 73, 'Diagnostic_procedure': 430, 'Severity': 53, 'Activity': 264, 'Administration': 49, 'Therapeutic_procedure': 180, 'Frequency': 26, 'Coreference': 35, 'Family_history': 10, 'Nonbiological_location': 139, 'Subject': 47, 'Time': 13, 'History': 39, 'Other_event': 15, 'Age': 12, 'Dosage': 24, 'Outcome': 7, 'Sex': 4, 'Other_entity': 13, 'Occupation': 3, 'Distance': 6, 'Quantitative_concept': 1, 'Texture': 2, 'Personal_background': 3}


Finding the sum of values in total

In [18]:
print("number of patient data:", sum(patient_entity_dict.values()))
print("number of clinicain data:", sum(clinician_entity_dict.values()))

number of patient data: 4440
number of clinicain data: 4082


Since 480 total datapoints, on  patient data is mapped : (per dialogue)

In [19]:
4440/480

9.25

for clinican data:

In [20]:
4082/480

8.504166666666666

Sorting based on values

In [21]:
patient_entity_dict_sorted = sorted(patient_entity_dict.items(), key=lambda  item: item[1], reverse=True)
patient_entity_dict_sorted

[('Sign_symptom', 937),
 ('Medication', 470),
 ('Disease_disorder', 389),
 ('Biological_structure', 348),
 ('Detailed_description', 326),
 ('Diagnostic_procedure', 316),
 ('Nonbiological_location', 191),
 ('Lab_value', 177),
 ('Duration', 164),
 ('Subject', 137),
 ('Date', 134),
 ('Clinical_event', 114),
 ('Activity', 91),
 ('Therapeutic_procedure', 91),
 ('History', 87),
 ('Severity', 86),
 ('Age', 61),
 ('Dosage', 59),
 ('Frequency', 55),
 ('Time', 54),
 ('Coreference', 44),
 ('Administration', 23),
 ('Outcome', 19),
 ('Occupation', 13),
 ('Quantitative_concept', 11),
 ('Other_event', 8),
 ('Other_entity', 7),
 ('Family_history', 7),
 ('Sex', 6),
 ('Distance', 5),
 ('Texture', 4),
 ('Color', 3),
 ('Personal_background', 3)]

In [22]:
clinican_entity_dict_sorted = sorted(clinician_entity_dict.items(), key=lambda  item: item[1], reverse=True)
clinican_entity_dict_sorted

[('Sign_symptom', 623),
 ('Disease_disorder', 481),
 ('Medication', 442),
 ('Diagnostic_procedure', 430),
 ('Detailed_description', 427),
 ('Activity', 264),
 ('Biological_structure', 260),
 ('Clinical_event', 200),
 ('Therapeutic_procedure', 180),
 ('Lab_value', 176),
 ('Nonbiological_location', 139),
 ('Duration', 73),
 ('Severity', 53),
 ('Administration', 49),
 ('Subject', 47),
 ('History', 39),
 ('Coreference', 35),
 ('Frequency', 26),
 ('Date', 25),
 ('Dosage', 24),
 ('Other_event', 15),
 ('Time', 13),
 ('Other_entity', 13),
 ('Age', 12),
 ('Family_history', 10),
 ('Outcome', 7),
 ('Distance', 6),
 ('Sex', 4),
 ('Occupation', 3),
 ('Personal_background', 3),
 ('Texture', 2),
 ('Quantitative_concept', 1)]

Finding the set differences

In [23]:
set(patient_entity_dict.keys()).difference(set(clinician_entity_dict.keys()))

{'Color'}

In [24]:
set(clinician_entity_dict.keys()).difference(set(patient_entity_dict.keys()))

set()

We can see that there are few mappings. Additionally, we 

# Finding Mismatches : looping through the minidata to find mismatches between entities the paired data

Now, we can perform a different type of analysis. Now, that we know that there is a discrepancy between total values, what are differences in the occurance per dialogue. We use the list of found entities before 

In [25]:
clinician_entity_dict.keys()

dict_keys(['Biological_structure', 'Sign_symptom', 'Disease_disorder', 'Medication', 'Detailed_description', 'Date', 'Clinical_event', 'Lab_value', 'Duration', 'Diagnostic_procedure', 'Severity', 'Activity', 'Administration', 'Therapeutic_procedure', 'Frequency', 'Coreference', 'Family_history', 'Nonbiological_location', 'Subject', 'Time', 'History', 'Other_event', 'Age', 'Dosage', 'Outcome', 'Sex', 'Other_entity', 'Occupation', 'Distance', 'Quantitative_concept', 'Texture', 'Personal_background'])

In [26]:
#initialize a dictionary of mismatches 
patient_mismatch_dict = {} #entities mapped in patient text but not in the clinican text
clinician_mismatch_dict = {}  #entities mapped in clinican text but not in the patient text
for key in list(patient_entity_dict.keys()):
    patient_mismatch_dict[key] = 0
for key in list(clinician_entity_dict.keys()):
    clinician_mismatch_dict[key] = 0


In [27]:
#track the row indices with a patient mismatch mapping diex w
patient_mismatch_rows = []
#track the rows with a  clinican mismatch
clinician_mismatch_rows = []

for index,row in df_mini_pairs.iterrows():
    current_patient_entities = set()
    current_clinican_entities = set()
    utterances = row["utterances"]
    
    try:  
        patient =  utterances[0].replace("patient:","").strip()
        patient_results = pipe(patient)
        #loop through the ntities
        for dict in patient_results:
            curr_entity = dict["entity_group"]
            current_patient_entities.add(curr_entity)
    except:
        print("no patient mapping at index:", index)
        pass 

    try: 
        clinician = utterances[1].replace("doctor:","").strip()
        clinician_results = pipe(clinician)
        for dict in clinician_results:
            curr_entity = dict["entity_group"]
            current_clinican_entities.add(curr_entity)
    except:
        print("no clinican mapping at index:", index)
        pass 
    #find mismatches by computing set differneces 
    #entities mapped in patient text but not in the clinican text. increment the dictionary of mismatches 
    patient_mismatch = current_patient_entities - current_clinican_entities
    if len(patient_mismatch) > 0:
        patient_mismatch_rows.append(index) #if the number of mismatches is non zero, append
    for key in patient_mismatch:
        patient_mismatch_dict[key] += 1

    clinician_mismatch = current_clinican_entities - current_patient_entities
    if len(clinician_mismatch) > 0:
        clinician_mismatch_rows.append(index) #if the number of mismatches is non zero, append
    for key in clinician_mismatch:
        clinician_mismatch_dict[key] += 1


Printing sums of total mismatches

In [28]:
print("number of patient data:", sum(patient_mismatch_dict.values()))
print("number of clinician data:", sum(clinician_mismatch_dict.values()))

number of patient data: 1553
number of clinician data: 1320


number of rows of mismatches and percentages

In [29]:
print("percent of rows with patient data mismatches:", len(patient_mismatch_rows)/len(df_mini_pairs)*100)
print("percent of rows with clinician data mismatches:", len(clinician_mismatch_rows)/len(df_mini_pairs)*100)

percent of rows with patient data mismatches: 92.08333333333333
percent of rows with clinician data mismatches: 89.375


printing out mismatch dictionary with sorting

In [30]:
patient_mismatch_dict = sorted(patient_mismatch_dict.items(), key=lambda  item: item[1], reverse=True)
patient_mismatch_dict

[('Sign_symptom', 135),
 ('Disease_disorder', 111),
 ('Biological_structure', 95),
 ('Duration', 94),
 ('Medication', 94),
 ('Nonbiological_location', 91),
 ('Diagnostic_procedure', 91),
 ('Detailed_description', 86),
 ('Lab_value', 80),
 ('Subject', 79),
 ('Date', 79),
 ('Severity', 58),
 ('Age', 55),
 ('History', 54),
 ('Clinical_event', 46),
 ('Time', 44),
 ('Activity', 43),
 ('Frequency', 40),
 ('Therapeutic_procedure', 37),
 ('Coreference', 29),
 ('Dosage', 24),
 ('Outcome', 17),
 ('Administration', 15),
 ('Quantitative_concept', 11),
 ('Occupation', 11),
 ('Other_event', 8),
 ('Sex', 6),
 ('Other_entity', 6),
 ('Texture', 4),
 ('Distance', 3),
 ('Color', 3),
 ('Family_history', 3),
 ('Personal_background', 1)]

In [31]:
clinician_mismatch_dict = sorted(clinician_mismatch_dict.items(), key=lambda  item: item[1], reverse=True)
clinician_mismatch_dict

[('Detailed_description', 147),
 ('Activity', 138),
 ('Clinical_event', 120),
 ('Diagnostic_procedure', 107),
 ('Disease_disorder', 106),
 ('Medication', 99),
 ('Therapeutic_procedure', 85),
 ('Nonbiological_location', 76),
 ('Lab_value', 74),
 ('Sign_symptom', 62),
 ('Biological_structure', 47),
 ('Duration', 36),
 ('Severity', 36),
 ('Administration', 27),
 ('Coreference', 22),
 ('Subject', 19),
 ('Frequency', 15),
 ('Date', 13),
 ('History', 13),
 ('Other_event', 13),
 ('Time', 12),
 ('Dosage', 11),
 ('Age', 9),
 ('Other_entity', 9),
 ('Family_history', 7),
 ('Distance', 6),
 ('Outcome', 5),
 ('Texture', 2),
 ('Sex', 1),
 ('Occupation', 1),
 ('Quantitative_concept', 1),
 ('Personal_background', 1)]

## Looping to extract a set of terms, not entities

Finding a set of medical terms, not entities. This could be large


In [32]:
patient_entity_terms = {}
clinician_entity_terms = {}
for index,row in df_mini_pairs.iterrows():
    utterances = row["utterances"]
    
    try:  
        patient =  utterances[0].replace("patient:","").strip()
        patient_results = pipe(patient)
        #loop through the ntities
        for dict in patient_results:
            curr_entity = dict["word"]
            if curr_entity not in patient_entity_terms:
                patient_entity_terms[curr_entity] = 1
            else: 
                patient_entity_terms[curr_entity] = patient_entity_terms[curr_entity] + 1
    except:
        print("no patient mapping at index:", index)
        pass 

    try: 
        clinician = utterances[1].replace("doctor:","").strip()
        clinician_results = pipe(clinician)
        for dict in clinician_results:
            curr_entity = dict["word"]
            if curr_entity not in clinician_entity_terms:
                clinician_entity_terms[curr_entity] = 1
            else: 
                clinician_entity_terms[curr_entity] = clinician_entity_terms[curr_entity] + 1
    except:
        print("no clinican mapping at index:", index)
        pass 


Printing out side of patient terms and clincians terms and using set similarity

In [33]:
print(len(patient_entity_terms))
patient_entity_terms

2118


{'throat': 56,
 'sore': 50,
 'cold': 14,
 'over a week': 6,
 'low': 10,
 'grade': 4,
 'fever': 69,
 'for the past two days': 2,
 'dizzy': 5,
 'dr': 16,
 'tight': 12,
 'painful': 2,
 'chest': 47,
 'dry': 28,
 'headache': 36,
 'coronavirus': 43,
 '19': 25,
 'high': 7,
 'delivered': 1,
 'daughter': 16,
 'gave': 1,
 'birth': 1,
 '38 weeks': 1,
 '##rest': 1,
 'weak': 3,
 'ce': 6,
 'miscarried': 1,
 'pro': 3,
 '##ges': 1,
 '##one': 1,
 'mis': 1,
 'progesterone': 1,
 'every': 5,
 'virus': 4,
 'talking': 2,
 'dirrhea': 1,
 '36': 1,
 '10 month old': 2,
 'son': 15,
 'me': 2,
 '##as': 1,
 '##les': 1,
 'vaccination': 1,
 'due': 1,
 'coronavirus pandemic': 1,
 'wife': 9,
 '1': 9,
 '##month': 1,
 'hem': 2,
 'positive': 7,
 'blood': 4,
 'culture': 1,
 'er': 5,
 '##yt': 1,
 '##hr': 1,
 '##omycin': 1,
 '##day': 1,
 'cl': 4,
 '##indamycin': 1,
 'lev': 6,
 'pre': 3,
 '##d': 4,
 '##nis': 2,
 'thru': 1,
 'medications': 4,
 'evenings': 3,
 'np': 1,
 'co': 71,
 '##vid 19': 10,
 'slight': 15,
 'cough': 71,
 '

In [34]:
print(len(clinician_entity_terms))
clinician_entity_terms

1538


{'throat': 51,
 'pain': 32,
 'sore': 13,
 'co': 50,
 '##vid - 19': 13,
 'infections': 27,
 'st': 17,
 'cough': 54,
 '##vid': 31,
 'persistent': 8,
 'short': 5,
 'fever': 65,
 'dry': 19,
 'sob': 15,
 'age': 2,
 'com': 1,
 '##plication': 1,
 'call': 19,
 'symptoms': 50,
 'tired': 2,
 'worse': 9,
 'over 5 - 14 days': 1,
 'in': 5,
 '##fective': 1,
 'self': 4,
 'isolation': 4,
 'hygiene': 1,
 'covid - 19': 14,
 'covid': 6,
 '19': 30,
 'high': 19,
 'risk': 4,
 'video': 129,
 'text chat': 53,
 'pneumonia': 86,
 'pregnancy': 1,
 'antibiotics': 30,
 'consult': 43,
 'drink': 8,
 'oral': 11,
 'hydra': 7,
 'covid 19': 6,
 'airborne': 1,
 'probiotic': 1,
 'n95': 1,
 'mask': 6,
 'gown': 1,
 'eye protection': 1,
 'corona': 25,
 'va': 1,
 '##nation': 1,
 'flu': 28,
 '##cca': 2,
 'improve': 6,
 'immunity': 2,
 'exercise': 1,
 'daily': 3,
 'she': 4,
 'for a week': 1,
 'better': 3,
 '##vid testing': 3,
 'jo': 1,
 'sluim': 1,
 'ge': 3,
 'allergy': 2,
 'sl': 1,
 '##uim': 1,
 'b': 3,
 '##k': 1,
 'gee': 1,
 

computing jaccard similarity

In [35]:
pghd_set = set(patient_entity_terms.keys())
clinician_set = set(clinician_entity_terms.keys())

jaccard = len(pghd_set & clinician_set) / len(pghd_set | clinician_set)
print(f"Conceptual Overlap (Jaccard): {jaccard:.3f}")

Conceptual Overlap (Jaccard): 0.163


In [36]:
print(pghd_set-clinician_set)

{'pnemonia', 'rf', 'week and a', 'xray', 'preschool', 'soft', 'discharged', 'moved', 'flew', '35. 3c', 'azrithomycin', 'armpit', 'el', 'arm', 'mouth reading', 'takes', '##vid t', 'strength', 'older brothers', 'gave', 'one day out', '##scopic fund', 'speticemia', 'proventi', 'copd', 'aortic arch', 'grocery', 'cough n', 'thin', 'earlier', '38 — 39 degrees celsius', '##hb', 'diagnosis', 'eating well', '3 year old', 'oxygen levels', '3rd day', 'covid 1o', 'bloody', 'went', '##loxacin', 'ing', '##ar throat', 'hair wet', 'physiotherapy', '500 mg', 'knock', 'itching', 'breathing problems', '##ray', 'leak', 'everyday', '##zziness', '98 30mg', 'tuesday night', 'past 2 weeks', '##pinavir', 'wellstar douglas hospital', '##pa', 'nurse', 'been', 'upper body', 'y', 'go down', 'passed away', 'muscle ache', '15', '45 years', 'knee', 'grade', '##1 diabetes', 'one week ago', '81 years old', 'big', '103.', 'covid 19 virus', '##ula', 'sacks', 'long ago', 'b. p', '##ng', 'no short of breath', 'hepa', 'stuf

## future:

Showing (through concepts per 100 words): 
QuickUMLS ontology has beeter coveration