<center><h1>Medical Diagnosis Generator</h1><h4>Rebecca Hinrichs</h4><h5>∙ a walkthrough build of a Large Language Model (LLM) ∙</h5><h4>FALL 2023</h4></center>

---
---

---
1. Dataset

<center><hr style="width: 50%; border-color: black;"></center><br>

---
2. Methods

<center><hr style="width: 50%; border-color: black;"></center><br>

---
3. Fine-Tuning Process

<center><hr style="width: 50%; border-color: black;"></center><br>

---
4. Inference

<center><hr style="width: 50%; border-color: black;"></center><br>

---
---
<center><h2>The Code:</h2>
<hr style="width: 50%; border-color: black;"></center>

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # Make GPU 1 visible as cuda:0
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import torch
print("CUDA Available:", torch.cuda.is_available())
print("Device Count:", torch.cuda.device_count())
print("Current Device:", torch.cuda.current_device())
print("Device Name:", torch.cuda.get_device_name(torch.cuda.current_device()))

CUDA Available: True
Device Count: 1
Current Device: 0
Device Name: NVIDIA GeForce RTX 3060


In [None]:
# Install the required libraries
!pip install -r requirements.txt

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
import torch
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)

CUDA available: True
CUDA version: 12.1


In [4]:
# Import the required libraries
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, Trainer, DataCollatorForLanguageModeling
import transformers
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset, Dataset

# Download the Falcon-7B abbreviated model
model = "ybelkada/falcon-7b-sharded-bf16"

# Instantiate Falcon's tokenizer object
tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# Instantiate QLoRA's 4-bit configuration
bb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16)

# Load the Falcon Model with quantization config
falcon_model = AutoModelForCausalLM.from_pretrained(
    model,                             # model name we assigned at import
    quantization_config=bb_config,     # 4-bit configuration of parameters
    use_cache=False)                   # don't keep stuff we don't need

Loading checkpoint shards: 100%|██████████| 8/8 [00:11<00:00,  1.38s/it]


In [5]:
# Query Sample 1
text = "Question: What is the national bird of the United States? \n Answer: "
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
outputs = falcon_model.generate(input_ids=inputs.input_ids, max_new_tokens=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Response: is correct! but continues to fill tokenspace with another partial prompt

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Question: What is the national bird of the United States? 
 Answer:  The bald eagle.
 Question: What is the


In [6]:
# Query Sample 2
text2 = "How do I make a HTML hyperlink?"
inputs = tokenizer(text2, return_tensors="pt").to("cuda:0")
outputs = falcon_model.generate(input_ids=inputs.input_ids, max_new_tokens=35)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Response: seems confused, only formatted correctly in offering steps

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


How do I make a HTML hyperlink?
How do I make a HTML hyperlink?
How to Create a Hyperlink in HTML
- Step 1: Create a Hyperlink.
- Step 2:


In [7]:
# Query Sample 3
text3 = "A 25-year-old female presents with swelling, pain, and inability to bear weight on her left ankle following a fall during a basketball game where she landed awkwardly on her foot. The pain is on the outer side of her ankle. What is the likely diagnosis and next steps? "
inputs = tokenizer(text3, return_tensors="pt").to("cuda:0")
outputs = falcon_model.generate(input_ids=inputs.input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Response: seems to stutter & offer catastrophic responses

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


A 25-year-old female presents with swelling, pain, and inability to bear weight on her left ankle following a fall during a basketball game where she landed awkwardly on her foot. The pain is on the outer side of her ankle. What is the likely diagnosis and next steps? (A) Ankle sprain (B) Ankle fracture (C) Ankle dislocation (D) Ankle fracture with dislocation (E) Ankle fracture with dislocation and ankle sprain
Ankle sprain
Ankle fracture
Ankle dislocation
Ankle fracture with dislocation and ankle sprain
Ankle fracture with dislocation and ankle sprain
Ankle fracture with dislocation and ankle sprain
Ankle fracture with dislocation


<h1>Fine-Tuning</h1>

In [8]:
# Fine-Tuning Configuration (single epoch, single batch)
training_args = TrainingArguments(
    output_dir="./finetuned_falcon",
    eval_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    fp16 = True,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=1,
    logging_steps=1,
    num_train_epochs=1,
    optim = "paged_adamw_8bit",
    report_to = "none"
)
falcon_model.gradient_checkpointing_enable()
falcon_model = prepare_model_for_kbit_training(falcon_model)

# Instantiate LoRA Configuration
lora_config = LoraConfig(
    r=4,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",])
lora_model = get_peft_model(falcon_model, lora_config)

# Function to print the actual LoRA parameters vs total in Falcon
# https://dataman-ai.medium.com/fine-tune-a-gpt-lora-e9b72ad4ad3
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

# Demonstrate our trainable parameters
print_trainable_parameters(lora_model)

trainable params: 8159232 || all params: 3616904064 || trainable%: 0.22558607736409123


In [9]:
# Import the data from Hugging Face (outputs progress)
dataset = load_dataset("BI55/MedText", split="train")
import pandas as pd
df = pd.DataFrame(dataset)
prompt = df.pop("Prompt")
comp = df.pop("Completion")
df["Info"] = prompt + "\n" + comp
list_prompt = list(prompt)
list_comp = list(comp)
for i in range(5):
  print(list_prompt[i])
  print()
  print(list_comp[i])
  print("\n\n\n*********")

# Function to tokenize the dataset
# https://www.kaggle.com/code/harveenchadha/tokenize-train-data-using-bert-tokenizer
def tokenizing(text, tokenizer, chunk_size, maxlen):
    input_ids = []
    tt_ids = []
    at_ids = []
    for i in range(0, len(text), chunk_size):
        text_chunk = text[i:i+chunk_size]
        encs = tokenizer(
                    text_chunk,
                    max_length = 2048,
                    padding='max_length',
                    truncation=True)
        input_ids.extend(encs['input_ids'])
        tt_ids.extend(encs['token_type_ids'])
        at_ids.extend(encs['attention_mask'])
    return {'input_ids': input_ids, 'token_type_ids': tt_ids, 'attention_mask':at_ids}

# Tokenize the data (2048 is the max token length Falcon accepts)
tokens = tokenizing(list(df["Info"]), tokenizer, 256, 2048)
tokens_dataset = Dataset.from_dict(tokens)
split_dataset = tokens_dataset.train_test_split(test_size=0.2)

# Instantiate Trainer object to process LoRA data in batches
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=split_dataset["train"],
    eval_dataset=split_dataset["test"],
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False))

A 50-year-old male presents with a history of recurrent kidney stones and osteopenia. He has been taking high-dose vitamin D supplements due to a previous diagnosis of vitamin D deficiency. Laboratory results reveal hypercalcemia and hypercalciuria. What is the likely diagnosis, and what is the treatment?

This patient's history of recurrent kidney stones, osteopenia, and high-dose vitamin D supplementation, along with laboratory findings of hypercalcemia and hypercalciuria, suggest the possibility of vitamin D toxicity. Excessive intake of vitamin D can cause increased absorption of calcium from the gut, leading to hypercalcemia and hypercalciuria, which can result in kidney stones and bone loss. Treatment would involve stopping the vitamin D supplementation and potentially providing intravenous fluids and loop diuretics to promote the excretion of calcium.



*********
A 7-year-old boy presents with a fever, headache, and severe earache. He also complains of dizziness and a spinning 

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [10]:
# Train & Save the Pre-Trained Single-Epoch Model
trainer.train()
trainer.model.save_pretrained("./finetuned_falcon")

Epoch,Training Loss,Validation Loss
1,1.3681,1.272765


In [14]:
# Add fine-tuned parameters to Falcon model (finally)
from peft import PeftConfig, PeftModel
config = PeftConfig.from_pretrained('./finetuned_falcon')
finetuned_model = PeftModel.from_pretrained(falcon_model, './finetuned_falcon')

# Real Query (re-do of Sample Query 3)
text4 = "A 25-year-old female presents with swelling, pain, and inability to bear weight on her left ankle following a fall during a basketball game where she landed awkwardly on her foot. The pain is on the outer side of her ankle. What is the likely diagnosis and next steps?"
inputs = tokenizer(text4, return_tensors="pt").to("cuda:0")
outputs = finetuned_model.generate(input_ids=inputs.input_ids, max_new_tokens=90)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Response: gives likely diagnosis and best treatment plan!

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


A 25-year-old female presents with swelling, pain, and inability to bear weight on her left ankle following a fall during a basketball game where she landed awkwardly on her foot. The pain is on the outer side of her ankle. What is the likely diagnosis and next steps?
This patient's symptoms suggest a lateral ankle sprain, which is a common injury in sports. The next steps would include rest, ice, compression, and elevation (RICE) of the ankle, followed by physical therapy to restore ankle range of motion and strength. If symptoms persist, imaging may be necessary to rule out a fracture. If a fracture is confirmed, immobilization and referral to an orthopedist may be necessary.



In [11]:
# Saving our working dependencies
!pip freeze > requirements.txt

---
<center><i>Thank you!</i></center>

<center><hr style="width: 50%; border-color: black;"></center><br>