**Section 1: Install OpenAI**

In [18]:
!pip install openai



**Section 2: Load Libraries**

In [19]:
import pandas as pd
import numpy as np
import json
import time
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, average_precision_score, RocCurveDisplay
from torch.utils.data import Dataset, DataLoader
import openai

**Section 3: Initialize the OpenAI Client with my API Key**
Note: Key masked in final code for privacy reasons

In [None]:
openai.api_key = 'xxxxxxxx' # Key masked for privacy. Replace it with your own API.

**Section 3: Chain-of-thought (CoT) Approach**

In [21]:
prompt_cot = """
Patient reports sneezing, itchy eyes, and a runny nose.

Let's think step-by-step about what this could indicate.
"""

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": prompt_cot}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])

Step 1: Identifying the symptoms - The patient is experiencing sneezing, itchy eyes, and a runny nose. These are common symptoms of an allergic reaction or a respiratory infection.

Step 2: Considering common conditions - All of these symptoms are typical of allergic rhinitis, also known as hay fever, which is an allergic response to certain allergens, such as pollen, dust mites, or pet dander. It could also indicate a common cold or flu, which are viral infections.

Step 3: Considering other factors - If the patient's symptoms are seasonal or occur after exposure to certain environments or substances, an allergy is more likely. If the patient also has a fever or body aches, a viral infection like the flu may be more likely.

Step 4: Next steps - The patient should be asked about other symptoms, the duration and timing of these symptoms, and any known allergies. A physical examination may also be performed. An allergy test could help confirm an allergic reaction, while a nasal or throa

**Section 4: Train-of-thought (ToT) Approach**

In [22]:
prompt_tot = """
The patient has these symptoms: shortness of breath, hives, and swelling after eating peanuts.

Think of three possible conditions this could be and explain each.

Then pick the most likely one and justify your reasoning.
"""

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": prompt_tot}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])


1. Peanut Allergy: This is a type of food allergy where the immune system overreacts to proteins in peanuts. Symptoms can range from mild, such as hives and itching, to severe, such as shortness of breath and swelling, which can cause anaphylaxis - a life-threatening allergic reaction.

2. Asthma: Shortness of breath can be a symptom of asthma, but it's usually accompanied by wheezing and tightness in the chest. The hives and swelling after eating peanuts could be a coincidence and not related to the respiratory issues.

3. Anaphylaxis: This is a severe and potentially life-threatening allergic reaction that can occur within seconds or minutes of exposure to something you're allergic to, such as peanuts. Symptoms include hives, swelling, shortness of breath, low blood pressure, and potential loss of consciousness.

The most likely condition is a Peanut Allergy, specifically leading to Anaphylaxis. The timing of the symptoms (occurring after eating peanuts) as well as the specific sympt

**Section 5: Generating One Shot Prompts for GPT-4 from the Dataset and Predictions**

In [39]:
import pandas as pd
import random
import openai
import re

# 1. Category mapping 
CATEGORY_MAP = {
    "soya bean (substance)": "food",
    "peanut (substance)": "food",
    "shrimp (substance)": "food",
    "penicillin v": "medication",
    "amoxicillin": "medication",
    "mold (organism)": "mold (organism)",
    "animal dander (substance)": "animal dander (substance)"
}

# 2. Normalize categories
def normalize_diag(diagnosis):
    diagnosis = diagnosis.lower().strip()
    return CATEGORY_MAP.get(diagnosis, diagnosis)

# 3. Extract answer from GPT response
def extract_diagnosis(gpt_response):
    match = re.search(r"diagnosis\s*[:\-]?\s*(.+)", gpt_response, re.IGNORECASE)
    return match.group(1).strip() if match else gpt_response.strip()

# 4. Call GPT
def call_gpt(prompt, model="gpt-4"):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    return response['choices'][0]['message']['content']

# 5. One-shot generator from dataset
def generate_one_shot_prompt_from_dataset(csv_path, choices=None):
    df = pd.read_csv(csv_path)
    valid = df.dropna(subset=["DESCRIPTION", "DESCRIPTION1"])
    sample_rows = valid.sample(n=2, random_state=random.randint(1, 10000))
    train_row, test_row = sample_rows.iloc[0], sample_rows.iloc[1]

    train_symptoms = train_row["DESCRIPTION1"]
    if pd.notna(train_row.get("DESCRIPTION2")):
        train_symptoms += f", {train_row['DESCRIPTION2']}"
    train_diagnosis = train_row["DESCRIPTION"]

    test_symptoms = test_row["DESCRIPTION1"]
    if pd.notna(test_row.get("DESCRIPTION2")):
        test_symptoms += f", {test_row['DESCRIPTION2']}"
    true_diagnosis = test_row["DESCRIPTION"]

    choice_text = ""
    if choices:
        choice_text = "\nChoose one of the following diagnoses:\n- " + "\n- ".join(choices)

    prompt = f"""Example:
Symptoms: {train_symptoms}
Diagnosis: {train_diagnosis}

Now, analyze this:
Symptoms: {test_symptoms}{choice_text}
Diagnosis:"""

    return prompt, true_diagnosis

# 6. Full test function
def run_one_shot_test(csv_path, choices):
    prompt, true_label = generate_one_shot_prompt_from_dataset(csv_path, choices)
    gpt_response = call_gpt(prompt)
    predicted_diagnosis = extract_diagnosis(gpt_response)

    true_norm = normalize_diag(true_label)
    pred_norm = normalize_diag(predicted_diagnosis)

    is_correct = true_norm == pred_norm

    print("=== One-Shot Prompt ===")
    print(prompt)
    print("\n=== GPT Response ===")
    print(gpt_response)
    print("\nTrue Diagnosis:", true_label)
    print("Predicted Diagnosis:", predicted_diagnosis)
    print("Correct:", is_correct)

    return {
        "prompt": prompt,
        "response": gpt_response,
        "true": true_label,
        "predicted": predicted_diagnosis,
        "normalized_true": true_norm,
        "normalized_pred": pred_norm,
        "correct": is_correct
    }


In [None]:
openai.api_key = 'xxxxxxxx' # Key masked for privacy. Replace it with your own API.
choices = ["Mold (organism)", "Animal dander (substance)", "Penicillin V", "Food", "Medication", "Other (specify)"]
run_one_shot_test("C:/Users/SYYAD/Documents/MSAI/AI in Healthcare/LLM/synthea_sample_data_csv_nov2021/csv/allergies.csv", choices)

=== One-Shot Prompt ===
Example:
Symptoms: Wheal (finding)
Diagnosis: Cow's milk (substance)

Now, analyze this:
Symptoms: Nausea (finding), Eruption of skin (disorder)
Choose one of the following diagnoses:
- Mold (organism)
- Animal dander (substance)
- Penicillin V
- Food
- Medication
- Other (specify)
Diagnosis:

=== GPT Response ===
Diagnosis: Food

The symptoms of nausea and skin eruption are commonly associated with food allergies or intolerances. Food is a more likely diagnosis given the symptoms presented.

True Diagnosis: Shellfish (substance)
Predicted Diagnosis: Food
Correct: False


{'prompt': "Example:\nSymptoms: Wheal (finding)\nDiagnosis: Cow's milk (substance)\n\nNow, analyze this:\nSymptoms: Nausea (finding), Eruption of skin (disorder)\nChoose one of the following diagnoses:\n- Mold (organism)\n- Animal dander (substance)\n- Penicillin V\n- Food\n- Medication\n- Other (specify)\nDiagnosis:",
 'response': 'Diagnosis: Food\n\nThe symptoms of nausea and skin eruption are commonly associated with food allergies or intolerances. Food is a more likely diagnosis given the symptoms presented.',
 'true': 'Shellfish (substance)',
 'predicted': 'Food',
 'normalized_true': 'shellfish (substance)',
 'normalized_pred': 'food',
 'correct': False}

**Section 6: Generating Few Shot Prompts for GPT-4 from the Dataset and Predictions**

In [41]:
import pandas as pd

# Load allergies.csv
df = pd.read_csv('C:/Users/SYYAD/Documents/MSAI/AI in Healthcare/LLM/synthea_sample_data_csv_nov2021/csv/allergies.csv')
print(df.head())

        START  STOP                               PATIENT  \
0  2020-02-17   NaN  b9c610cd-28a6-4636-ccb6-c7a0d2a4cb85   
1  2020-02-17   NaN  b9c610cd-28a6-4636-ccb6-c7a0d2a4cb85   
2  2020-02-17   NaN  b9c610cd-28a6-4636-ccb6-c7a0d2a4cb85   
3  2020-02-17   NaN  b9c610cd-28a6-4636-ccb6-c7a0d2a4cb85   
4  2020-02-17   NaN  b9c610cd-28a6-4636-ccb6-c7a0d2a4cb85   

                              ENCOUNTER       CODE   SYSTEM  \
0  01efcc52-15d6-51e9-faa2-bee069fcbe44  111088007  Unknown   
1  01efcc52-15d6-51e9-faa2-bee069fcbe44   84489001  Unknown   
2  01efcc52-15d6-51e9-faa2-bee069fcbe44  260147004  Unknown   
3  01efcc52-15d6-51e9-faa2-bee069fcbe44  264287008  Unknown   
4  01efcc52-15d6-51e9-faa2-bee069fcbe44  256277009  Unknown   

                  DESCRIPTION     TYPE     CATEGORY    REACTION1  \
0           Latex (substance)  allergy  environment  247472004.0   
1             Mold (organism)  allergy  environment   76067001.0   
2  House dust mite (organism)  allergy  environmen

In [None]:
import pandas as pd
import openai
import re
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

# OpenAI key
openai.api_key = 'xxxxxxxx' # Key masked for privacy. Replace it with your own API.

# Load allergy dataset
df_allergies = pd.read_csv("C:/Users/SYYAD/Documents/MSAI/AI in Healthcare/LLM/synthea_sample_data_csv_nov2021/csv/allergies.csv")
valid_cases = df_allergies.dropna(subset=["DESCRIPTION", "DESCRIPTION1", "DESCRIPTION2"])

# Direct diagnosis with choices
def generate_direct_diagnosis_prompt(df, test_symptoms, choices=None, num_examples=5):
    valid = df.dropna(subset=["DESCRIPTION", "DESCRIPTION1", "DESCRIPTION2"])
    examples = valid.sample(n=num_examples, random_state=42)

    prompt_parts = []
    for _, row in examples.iterrows():
        s1 = row["DESCRIPTION1"]
        s2 = row["DESCRIPTION2"] if pd.notna(row["DESCRIPTION2"]) else ""
        symptoms = f"{s1}, {s2}" if s2 else s1
        diagnosis = row["DESCRIPTION"]
        prompt_parts.append(f"Example:\nSymptoms: {symptoms}\nDiagnosis: {diagnosis}")

    choice_text = ""
    if choices:
        choice_text = "\nChoose one of the following diagnoses:\n- " + "\n- ".join(choices)

    test_prompt = (
        f"\nNow, analyze this:\nSymptoms: {test_symptoms}{choice_text}\n"
        "Diagnosis:"
    )
    return "\n\n".join(prompt_parts) + "\n\n" + test_prompt

# Call GPT
def call_gpt(prompt, model="gpt-4o"):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    return response['choices'][0]['message']['content']

# Extract diagnosis from GPT reply
def extract_diagnosis(gpt_response):
    match = re.search(r"diagnosis\s*[:\-]\s*(.+)", gpt_response, re.IGNORECASE)
    return match.group(1).strip() if match else gpt_response.strip()

# Evaluate accuracy
def evaluate_predictions(df, test_cases, choices, num_examples=5, model="gpt-4"):
    logs = []
    correct_count = 0

    for i, (symptoms, true_diagnosis) in enumerate(test_cases):
        prompt = generate_direct_diagnosis_prompt(df, symptoms, choices, num_examples)
        gpt_output = call_gpt(prompt)
        predicted_diagnosis = extract_diagnosis(gpt_output)

        is_correct = predicted_diagnosis.lower() == true_diagnosis.lower()
        correct_count += int(is_correct)

        logs.append({
            "Case": i + 1,
            "Symptoms": symptoms,
            "True Diagnosis": true_diagnosis,
            "Predicted Diagnosis": predicted_diagnosis,
            "Correct": is_correct,
            "Raw GPT Output": gpt_output,
            "Prompt Used": prompt
        })

    accuracy = correct_count / len(test_cases)
    result_df = pd.DataFrame(logs)
    result_df.to_csv("gpt_diagnosis_results.csv", index=False)
    print(f"Evaluation Complete: Accuracy = {accuracy:.2%}")
    return result_df

# Add confidence + severity awareness
def extract_confidence(response_text):
    if "likely" in response_text.lower():
        return "High"
    elif "possible" in response_text.lower() or "maybe" in response_text.lower():
        return "Medium"
    else:
        return "Uncertain"

def add_confidence_and_severity(results_df, df_source):
    confidences = []
    severities = []

    for _, row in results_df.iterrows():
        raw = row["Raw GPT Output"]
        conf = extract_confidence(raw)
        confidences.append(conf)

        match = df_source[
            (df_source["DESCRIPTION"].str.lower() == row["True Diagnosis"].lower()) &
            (df_source["DESCRIPTION1"].str.lower().str.contains(row["Symptoms"].split(',')[0].lower(), na=False))
        ]
        if not match.empty:
            severities.append(match.iloc[0]["SEVERITY1"])
        else:
            severities.append("Unknown")

    results_df["Confidence"] = confidences
    results_df["Severity"] = severities
    return results_df

# Confusion matrix + classification report
def analyze_results(results_df):
    y_true = results_df["True Diagnosis"].str.lower()
    y_pred = results_df["Predicted Diagnosis"].str.lower()

    cm = confusion_matrix(y_true, y_pred, labels=y_true.unique())
    report = classification_report(y_true, y_pred, output_dict=True)
    report_df = pd.DataFrame(report).transpose()

    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', xticklabels=y_true.unique(), yticklabels=y_true.unique(), cmap='Blues')
    plt.title("GPT Diagnosis Confusion Matrix")
    plt.xlabel("Predicted")
    plt.ylabel("Actual")
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig("confusion_matrix.png")
    plt.close()

    report_df.to_csv("classification_report.csv")
    print("Confusion matrix and classification report saved.")

# Sample test data from dataset
test_cases = []
sample = valid_cases.sample(50, random_state=100)
#print(sample.head())
for _, row in sample.iterrows():
    s1 = row["DESCRIPTION1"]
    s2 = row["DESCRIPTION2"] if pd.notna(row["DESCRIPTION2"]) else ""
    symptoms = f"{s1}, {s2}" if s2 else s1
    #print(symptoms)
    test_cases.append((symptoms, row["DESCRIPTION"]))

# Define choices
choices = ["Animal dander (substance)", "Aspirin", "Bee venom (substance)", "cefdinir", "Cow's milk (substance)", "Eggs (edible) (substance)", "Fish (substance)", "Grass pollen (substance)", "House dust mite (organism)", "Ibuprofen", "Latex (substance)", "Lisinopril", "Mold (organism)", "Peanut (substance)", "Penicillin V", "Shellfish (substance)", "Soya bean (substance)", "Sulfamethoxazole / Trimethoprim", "Tree nut (substance)", "Tree pollen (substance)", "Wheat (substance)"]

# Run everything
results_df = evaluate_predictions(valid_cases, test_cases, choices)
results_df = add_confidence_and_severity(results_df, df_allergies)
analyze_results(results_df)




Evaluation Complete: Accuracy = 34.00%


  (df_source["DESCRIPTION1"].str.lower().str.contains(row["Symptoms"].split(',')[0].lower(), na=False))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Confusion matrix and classification report saved.
