# Dataset and Library
The sample data are [snippets](https://escholarship.org/uc/uciem_cpcem) from  Clinical Practice and Cases in Emergency Medicine case reports and [transformer](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) models from the Python library,[huggingface](https://huggingface.co/transformers)

In [1]:
%%capture
!pip install transformers

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [3]:
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

Downloading (…)lve/main/config.json:   0%|          | 0.00/386k [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/462M [00:00<?, ?B/s]

In [16]:
input = "old male presents with Tendon xanthomatosis and Corneal arcus eg pain, especially when walking or standing chest pain.tightness or pressure in the chest and shortness of breath.pain, tightness, and pressure in the neck, jaw, shoulders, and back.indigestion and heartburn.sleep problems and daytime exhaustion.dizziness.heart palpitations.."

In [17]:
tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

In [18]:
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

In [19]:
output

SequenceClassifierOutput(loss=None, logits=tensor([[-8.8515, -9.5654, -9.6420,  ..., -9.7807, -7.4112, -9.3728]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [20]:
predictions

tensor([[1.4315e-04, 7.0108e-05, 6.4942e-05,  ..., 5.6528e-05, 6.0406e-04,
         8.4994e-05]], grad_fn=<SigmoidBackward0>)

In [21]:
predicted_labels

['041',
 '278',
 '2780',
 '428',
 '998',
 'abnormal',
 'abscess',
 'accident',
 'acute',
 'apnea',
 'chronic',
 'failure',
 'heart',
 'infection',
 'later',
 'mention',
 'misadventure',
 'obesity',
 'patient',
 'procedure',
 'reaction',
 'site',
 'specified',
 'surgical',
 'time',
 'unspecified',
 'without']

In [None]:
input2 = "We present a case of a 19-year-old female presenting with intermittent chest pain, palpitations, and weakness present for two months. The patient had previously been evaluated at our emergency department one week earlier ."

In [None]:
tokenized_input2 = tokenizer(input2, return_tensors="pt")
output2 = model(**tokenized_input2)

In [None]:
predictions2 = torch.sigmoid(output2.logits)
predicted_labels2 = [model.config.id2label[_id] for _id in (predictions2 > 0.3).nonzero()[:, 1].tolist()]

In [None]:
output2

SequenceClassifierOutput(loss=None, logits=tensor([[ -9.5360, -10.4061, -11.0893,  ..., -10.9782,  -6.7574,  -9.8357]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [None]:
predictions2

tensor([[7.2203e-05, 3.0246e-05, 1.5274e-05,  ..., 1.7069e-05, 1.1609e-03,
         5.3503e-05]], grad_fn=<SigmoidBackward0>)

In [None]:
predicted_labels2

['285',
 '2859',
 '424',
 '428',
 '4280',
 '780',
 'anemia',
 'artery',
 'congestive',
 'disorders',
 'heart',
 'specified',
 'unspecified']

In [None]:
input3 = "A 60-year-old female presented with sudden onset visual disturbance in her right eye"

In [None]:
tokenized_input3 = tokenizer(input3, return_tensors="pt")
output3 = model(**tokenized_input3)

In [None]:
predictions3 = torch.sigmoid(output3.logits)
predicted_labels3 = [model.config.id2label[_id] for _id in (predictions3 > 0.3).nonzero()[:, 1].tolist()]

In [None]:
predicted_labels3

['affecting', 'cerebral', 'disorder', 'infarction', 'type', 'unspecified']

In [None]:
input4 = """We report the case of a 33-year-old, previously healthy male who presented to the emergency department for evaluation of transient, right-sided facial droop with visual changes. 
His symptoms began 12 days after falling off a scooter. Imaging revealed an extracranial internal CAD."""

In [None]:
tokenized_input4 = tokenizer(input4, return_tensors="pt")
output4 = model(**tokenized_input4)

In [None]:
predictions4 = torch.sigmoid(output4.logits)
predicted_labels4 = [model.config.id2label[_id] for _id in (predictions4 > 0.3).nonzero()[:, 1].tolist()]

In [None]:
predicted_labels4

['348', '780', 'cerebral', 'mention', 'specified', 'unspecified', 'without']

In [None]:
################################################################################################################################################

In [1]:
%%capture
!pip install transformers
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

In [2]:
# Define a function to predict diagnosis from a single sign or symptom
import pandas as pd
# Define a function to predict diagnosis from a single sign or symptom
def predict_diagnosis(input):
    # Tokenize the input
    tokenized_input = tokenizer(input, return_tensors="pt")
    # Get the model output
    output = model(**tokenized_input)
    # Convert the output to predicted labels
    predictions = torch.sigmoid(output.logits)
    predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.1).nonzero()[:, 1].tolist()]
    return predicted_labels

# Define a list of signs and symptoms
inputs = [
    " sever tachycardia"
]

# Predict the diagnosis for each input and store in a list of dictionaries
rows = []
for input in inputs:
    predicted_labels = predict_diagnosis(input)
    row = {"Input": input}
    for i, label in enumerate(predicted_labels):
        row[f"Diagnosis {i+1}"] = label
    rows.append(row)

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(rows)
input_df = df
# Print the dataframe
print(df)

                Input Diagnosis 1 Diagnosis 2 Diagnosis 3 Diagnosis 4  \
0   sever tachycardia         038         250        2724         276   

  Diagnosis 5 Diagnosis 6 Diagnosis 7 Diagnosis 8 Diagnosis 9  ...  \
0        2766         285        2859         288         401  ...   

  Diagnosis 80 Diagnosis 81 Diagnosis 82 Diagnosis 83 Diagnosis 84  \
0       stated       status  tachycardia      tubular         type   

  Diagnosis 85  Diagnosis 86 Diagnosis 87 Diagnosis 88 Diagnosis 89  
0        ulcer  uncontrolled  unspecified          use      without  

[1 rows x 90 columns]


In [3]:
df

Unnamed: 0,Input,Diagnosis 1,Diagnosis 2,Diagnosis 3,Diagnosis 4,Diagnosis 5,Diagnosis 6,Diagnosis 7,Diagnosis 8,Diagnosis 9,...,Diagnosis 80,Diagnosis 81,Diagnosis 82,Diagnosis 83,Diagnosis 84,Diagnosis 85,Diagnosis 86,Diagnosis 87,Diagnosis 88,Diagnosis 89
0,sever tachycardia,38,250,2724,276,2766,285,2859,288,401,...,stated,status,tachycardia,tubular,type,ulcer,uncontrolled,unspecified,use,without




In [19]:
import pandas as pd

# Define a function to predict diagnosis from a single sign or symptom
def predict_diagnosis(input):
    # Tokenize the input
    tokenized_input = tokenizer(input, return_tensors="pt")
    # Get the model output
    output = model(**tokenized_input)
    # Convert the output to predicted labels
    predictions = torch.sigmoid(output.logits)
    predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.1).nonzero()[:, 1].tolist()]
    return predicted_labels

# Define a list of signs and symptoms
inputs = [
    " John Doe complains of difficulty seeing especially at night He also reports having headaches and mild eye pain History of Present Illness:John Doe has a history of open angle glaucoma that was diagnosed 10 years ago He has been regularly seeing his ophthalmologist and has been taking medication to control his intraocular pressure However  in the last few weeks  he has been experiencing worsening of his symptoms He reports that his vision has become blurry and he has difficulty seeing at night He also experiences headaches and mild eye pain.Past Medical History John Doe has a history of hypertension hyperlipidemia and type 2 diabetes He underwent cataract surgery in his right eye five years ago "

]

# Predict the diagnosis for each input and store in a list of dictionaries
rows = []
for input in inputs:
    predicted_labels = predict_diagnosis(input)
    row = {"Input": input}
    for i, label in enumerate(predicted_labels):
        row[f"Diagnosis {i+1}"] = label
    rows.append(row)

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(rows)
input_df = df

# Load the mapping table
mapping_table = pd.read_excel("/content/diseases_2.xlsx")
mapping_table['Code'] = mapping_table['Code'].astype(str).str.replace('[^\d]+', '')

# Define a function to get the diagnosis description from the mapping table
def get_diagnosis_description(code):
    row = mapping_table[mapping_table['Code'] == str(code)]
    return row.iloc[0]['Description'] if not row.empty else ''

# Define a function to map diagnosis codes to their descriptions
def map_diagnosis_codes(input_df, mapping_table):
    """
    Maps diagnosis codes in an input DataFrame to their corresponding descriptions using a mapping table.
    """
    # Make a copy of the input DataFrame to avoid modifying the original
    mapped_df = input_df.copy()
    
    # Iterate over the columns of the input DataFrame and map diagnosis codes to their descriptions
    for col in input_df.columns:
        if col == 'Input':
            continue
        mapped_df[col] = input_df[col].astype(str).apply(lambda x: get_diagnosis_description(int(x)) if x.isnumeric() else x)
    
    return mapped_df

# Apply the mapping function to the input DataFrame
mapped_df = map_diagnosis_codes(input_df, mapping_table)

# Print the mapped DataFrame
print(mapped_df)


  mapping_table['Code'] = mapping_table['Code'].astype(str).str.replace('[^\d]+', '')


                                               Input  \
0   John Doe complains of difficulty seeing espec...   

                  Diagnosis 1  \
0  Melanocytic naevi of trunk   

                                         Diagnosis 2  Diagnosis 3  \
0  Motorcycle rider injured in collision with two...  Spirillosis   

                     Diagnosis 4                             Diagnosis 5  \
0  Haemarthrosis, multiple sites  Haemarthrosis, pelvic region and thigh   

         Diagnosis 6                                        Diagnosis 7  \
0  Addisonian crisis  Motorcycle rider injured in collision with fix...   

                    Diagnosis 8  \
0  Other forms of leptospirosis   

                                         Diagnosis 9  ... Diagnosis 58  \
0  Contact with nonpowered hand tool, other speci...  ...  replacement   

  Diagnosis 59 Diagnosis 60 Diagnosis 61 Diagnosis 62 Diagnosis 63  \
0  retinopathy       senile       stated     surgical         type   

   Diagnosis 64 

In [20]:
mapped_df

Unnamed: 0,Input,Diagnosis 1,Diagnosis 2,Diagnosis 3,Diagnosis 4,Diagnosis 5,Diagnosis 6,Diagnosis 7,Diagnosis 8,Diagnosis 9,...,Diagnosis 58,Diagnosis 59,Diagnosis 60,Diagnosis 61,Diagnosis 62,Diagnosis 63,Diagnosis 64,Diagnosis 65,Diagnosis 66,Diagnosis 67
0,John Doe complains of difficulty seeing espec...,Melanocytic naevi of trunk,Motorcycle rider injured in collision with two...,Spirillosis,"Haemarthrosis, multiple sites","Haemarthrosis, pelvic region and thigh",Addisonian crisis,Motorcycle rider injured in collision with fix...,Other forms of leptospirosis,"Contact with nonpowered hand tool, other speci...",...,replacement,retinopathy,senile,stated,surgical,type,uncontrolled,unspecified,use,without
