### Plan B:
ECG Impressions + Clinical Features + Outcome -> LLM -> 1 response

In [1]:
# Import packages
import json
from huggingface_hub import login
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer, pipeline
import transformers
import random
import torch
import time
import re
from tqdm import tqdm
import pandas as pd

# Set GPUs
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

print("Available GPUs:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
    print(f"GPU {i}: {torch.cuda.get_device_name(i)}")

# Set seeds
random.seed(0)
torch.manual_seed(0)

  from .autonotebook import tqdm as notebook_tqdm


Available GPUs: 2
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB


<torch._C.Generator at 0x7fd9e4946930>

In [2]:
# Log into Huggingface
with open("../../huggingface_token.txt", "r") as file:
    access_token = file.read().strip()
login(access_token)

# Load Huggingface Model
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, token=access_token, low_cpu_mem_usage=True,
                    torch_dtype=torch.float16, device_map='auto')

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.73s/it]


In [20]:
def prompt_outcome(note, outcome):
    return note + "\nPATIENT OUTCOME: " + outcome

def get_message(note):
    system = 'You are a cardiologist. You are given the patient outcome. Your task is to provide reasoning for the patient outcome. Please structure the results as follows: REASONING: [Your explanation here]'
    prompt = f"Here is the patient data: \n{note}"

    messages = [
		{"role": "system", "content": system},
		{"role": "user", "content": prompt}
	]

    return messages

def extract_assistant_response(response):
    parts = response.split("assistant\n\n", 1)
    return parts[1].strip() if len(parts) > 1 else response

In [7]:
# Load in csv file with prompts
df = pd.read_csv("../Data/subject-info-cleaned-with-prompts.csv")
# print(df['Prompts'][0])

In [21]:
print(prompt_outcome(df['Prompts'][0], df['Outcome'][0]))

Age: 58
Gender: Male 
Weight: 83 kg
Height: 163 cm
NYHA Class: III
Blood Pressure: 110/75 mmHg
Past Medical History: Idiopathic dilated cardiomyopathy
Albumin (g/L): 42,4
ALT or GPT (IU/L): 10.0
AST or GOT (IU/L): 20.0
Total Cholesterol (mmol/L): 5,4
Creatinine (mmol/L): 106.0
Gamma-glutamil transpeptidase (IU/L): 20
Glucose (mmol/L): 5,7
Hemoglobin (g/L): 132.0
HDL (mmol/L): 1,29
Potassium (mEq/L): 4,6
LDL (mmol/L): 3,36
Sodium (mEq/L): 141.0
Pro-BNP (ng/L): 1834.0
Protein (g/L): 69
T3 (pg/dL): 0,05
T4 (ng/L): 15
Troponin (ng/mL): 0,01
TSH (mIU/L): 3,02
Urea (mg/dL): 7,12
LVEF (%): 35.0
Medications: Beta Blockers, Digoxin, Loop Diuretics, ACE Inhibitor
ECG Impression:
        - Ventricular Extrasystole: Polymorphic
        - Ventricular Tachycardia: Non-sustained VT
        - Non-sustained ventricular tachycardia (CH>10): Yes
        - Paroxysmal supraventricular tachyarrhythmia: Unknown paroxysmal supraventricular tachyarrhythmia code
        - Bradycardia: Unknown bradycardia code
 

In [25]:
# Test message
message = get_message(prompt_outcome(df['Prompts'][700], df['Outcome'][700]))
message

[{'role': 'system',
  'content': 'You are a cardiologist. You are given the patient outcome. Your task is to provide reasoning for the patient outcome. Please structure the results as follows: REASONING: [Your explanation here]'},
 {'role': 'user',
  'content': 'Here is the patient data: \nAge: 65\nGender: Male \nWeight: 79 kg\nHeight: 162 cm\nNYHA Class: II\nBlood Pressure: 115/80 mmHg\nPast Medical History: Idiopathic dilated cardiomyopathy, Dyslipemia, Hypertension, Myocardial Infarction\nAlbumin (g/L): 49,6\nALT or GPT (IU/L): 18.0\nAST or GOT (IU/L): 26.0\nTotal Cholesterol (mmol/L): 4,68\nCreatinine (mmol/L): 149.0\nGamma-glutamil transpeptidase (IU/L): 14\nGlucose (mmol/L): 4,9\nHemoglobin (g/L): 134.0\nHDL (mmol/L): 1,11\nPotassium (mEq/L): 4,6\nLDL (mmol/L): 2,82\nSodium (mEq/L): 142.0\nPro-BNP (ng/L): 1114.0\nProtein (g/L): 76,9\nT4 (ng/L): 16\nTSH (mIU/L): 0,86\nUrea (mg/dL): 8,99\nLVEF (%): 25.0\nMedications: Beta Blockers, Loop Diuretics, Spironolactone, Statins, ACE Inhib

In [26]:
# Put message into LLM
input_text = tokenizer.apply_chat_template(message, tokenize = False, add_generation_prompt = True)
inputs = tokenizer(input_text, return_tensors = "pt").to(model.device)
output = model.generate(**inputs, max_new_tokens = 1000)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [27]:
# Get result
result = tokenizer.decode(output[0], skip_special_tokens = True)
result = result.replace("**", "")
result = extract_assistant_response(result)
print(result)

REASONING: 

The patient's outcome of sudden cardiac death can be attributed to a combination of his underlying medical conditions and laboratory results. 

Firstly, the patient has a history of idiopathic dilated cardiomyopathy, which is a condition where the heart muscle becomes weakened and the heart chambers enlarge, leading to reduced heart function. The patient's left ventricular ejection fraction (LVEF) of 25% indicates severe left ventricular dysfunction, which increases the risk of sudden cardiac death.

Furthermore, the patient has a history of myocardial infarction, which can lead to scarring of the heart muscle and further compromise cardiac function. His elevated Pro-BNP (B-type natriuretic peptide) level of 1114 ng/L is indicative of heart failure, which is a significant risk factor for sudden cardiac death.

Additionally, the patient's blood pressure is slightly elevated (115/80 mmHg), which may be a concern for further cardiovascular strain. His lipid profile shows an e