# MediSimplifier - Inference Demonstration

**Medical Discharge Summary Simplification using LoRA Fine-Tuned Models**

This notebook demonstrates inference with all three MediSimplifier models hosted on HuggingFace:
- **OpenBioLLM-8B** üèÜ (Best overall performance)
- **Mistral-7B** (Best readability)
- **BioMistral-7B-DARE** (Medical baseline)

**Authors:** Guy Dor & Shmulik Avraham  
**Course:** DS25 Deep Learning, Technion  
**Date:** January 2026

### Resources
- **Models:** [HuggingFace](https://huggingface.co/GuyDor007/MediSimplifier-LoRA-Adapters)
- **Dataset:** [HuggingFace](https://huggingface.co/datasets/GuyDor007/medisimplifier-dataset)
- **Code:** [GitHub](https://github.com/gd007/MediSimplifier)

## 1. Environment Setup

In [1]:
# Install required packages (uncomment if needed)
# !pip install torch transformers peft datasets accelerate -q

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from datasets import load_dataset
import warnings
warnings.filterwarnings('ignore')

# Detect device
if torch.cuda.is_available():
    DEVICE = "cuda"
    DTYPE = torch.bfloat16
elif torch.backends.mps.is_available():
    DEVICE = "mps"
    DTYPE = torch.float32  # MPS works better with float32
else:
    DEVICE = "cpu"
    DTYPE = torch.float32

print(f"Using device: {DEVICE}")
print(f"PyTorch version: {torch.__version__}")

Using device: mps
PyTorch version: 2.9.1


## 2. Load Dataset from HuggingFace

In [3]:
# Load MediSimplifier dataset
dataset = load_dataset("GuyDor007/medisimplifier-dataset")

print("Dataset loaded successfully!")
print(f"  Train:      {len(dataset['train']):,} samples")
print(f"  Validation: {len(dataset['validation']):,} samples")
print(f"  Test:       {len(dataset['test']):,} samples")
print(f"  Columns:    {dataset['train'].column_names}")

Dataset loaded successfully!
  Train:      7,999 samples
  Validation: 999 samples
  Test:       1,001 samples
  Columns:    ['text', 'instruction', 'input', 'output']


In [4]:
# Select a test sample for demonstration
TEST_INDEX = 0  # Change this to test different samples
test_sample = dataset['test'][TEST_INDEX]

medical_text = test_sample['input']
ground_truth = test_sample['output']

print("=" * 70)
print("ORIGINAL MEDICAL TEXT")
print("=" * 70)
print(medical_text)

ORIGINAL MEDICAL TEXT
Discharge Summary:

Patient Name: [REDACTED]
Medical Record Number: [REDACTED]

Hospital Course:
The patient was admitted to our outpatient clinic with widespread skin-colored, dome-shaped lesions over the face and neck, which gradually increased in size and number over a year. The patient disclosed his HIV-positive status, with a CD4 + count of 82 cells/mm3. The patient also reported using a standard antiretroviral therapy regimen for the last five years. The clinical diagnosis was made due to the presence of molluscae bodies in some of the skin lesions, and the patient was confirmed with molluscum contagiosum. No systemic modalities of retinoid or other antiviral medication were used previously. The patient was prescribed multiple topical and oral medications, but the lesions showed no significant improvement.

Discharge Diagnosis:
Diagnosis: Molluscum contagiosum

Hospital Course Summary:
The patient was treated with oral isotretinoin 0.5 mg/kg for one month, c

## 3. Model Configuration

All models are hosted in a single HuggingFace repo with subfolders:
- `openbiollm_8b_lora/` - Llama3 architecture, uses **ChatML** format
- `mistral_7b_lora/` - Mistral architecture, uses **Mistral** format
- `biomistral_7b_dare_lora/` - Mistral architecture, uses **Mistral** format

In [5]:
# HuggingFace repo containing all adapters
HF_REPO = "GuyDor007/MediSimplifier-LoRA-Adapters"

# Model configurations
MODELS = {
    "OpenBioLLM-8B": {
        "base_model": "aaditya/Llama3-OpenBioLLM-8B",
        "adapter_subfolder": "openbiollm_8b_lora",
        "format": "chatml",
    },
    "Mistral-7B": {
        "base_model": "mistralai/Mistral-7B-Instruct-v0.2",
        "adapter_subfolder": "mistral_7b_lora",
        "format": "mistral",
    },
    "BioMistral-7B": {
        "base_model": "BioMistral/BioMistral-7B-DARE",
        "adapter_subfolder": "biomistral_7b_dare_lora",
        "format": "mistral",
    },
}

print("Model configurations loaded.")

Model configurations loaded.


## 4. Prompt Templates & Utility Functions

**Important:** Each model uses its native prompt format:
- **OpenBioLLM-8B:** ChatML format (`<|im_start|>...<|im_end|>`)
- **Mistral-7B / BioMistral-7B:** Mistral format (`[INST]...[/INST]`)

In [7]:
# System message and task instruction (consistent across all models)
SYSTEM_MESSAGE = "You are a helpful medical assistant that simplifies complex medical text for patients."

TASK_INSTRUCTION = """Simplify the following medical discharge summary in plain language for patients with no medical background.
Guidelines:
- Replace medical jargon with everyday words (e.g., "hypertension" ‚Üí "high blood pressure")
- Keep all important information (diagnoses, medications, follow-up instructions)
- Use short, clear sentences (aim for 15-20 words per sentence)
- Aim for a 6th-grade reading level
- Maintain the same structure as the original
- Do not add or omit information
- Keep the same patient reference style (e.g., "The patient" stays "The patient", not "You")
- Output plain text only (no markdown, no bold, no headers, no bullet points)
- Do not include empty lines or separator characters like "---\" """


def build_prompt(medical_text: str, format_type: str) -> str:
    """
    Build prompt using the correct format for each model architecture.
    
    Args:
        medical_text: The medical discharge summary to simplify
        format_type: 'chatml' for OpenBioLLM, 'mistral' for Mistral/BioMistral
    
    Returns:
        Formatted prompt string
    """
    if format_type == "chatml":
        # ChatML format for OpenBioLLM-8B (Llama3 architecture)
        prompt = f"""<|im_start|>system
{SYSTEM_MESSAGE}<|im_end|>
<|im_start|>user
{TASK_INSTRUCTION}

{medical_text}<|im_end|>
<|im_start|>assistant
"""
    elif format_type == "mistral":
        # Mistral format for Mistral-7B and BioMistral-7B
        prompt = f"""[INST] <<SYS>>
{SYSTEM_MESSAGE}
<</SYS>>

{TASK_INSTRUCTION}

{medical_text} [/INST]"""
    else:
        raise ValueError(f"Unknown format type: {format_type}")
    
    return prompt


def clean_output(generated_text: str, format_type: str) -> str:
    """
    Post-process model output to clean up any artifacts.
    
    Args:
        generated_text: Raw generated tokens (after token slicing)
        format_type: The prompt format used
    
    Returns:
        Cleaned simplified text
    """
    cleaned = generated_text.strip()
    
    # Remove format-specific tokens
    if format_type == "chatml":
        for token in ["<|im_start|>", "<|im_end|>", "<|end_of_text|>", "<|eot_id|>"]:
            cleaned = cleaned.replace(token, "").strip()
        # Truncate if model starts new turn
        if "<|im_start|>user" in cleaned:
            cleaned = cleaned.split("<|im_start|>user")[0].strip()
    
    elif format_type == "mistral":
        for token in ["</s>", "<s>", "[INST]", "[/INST]"]:
            cleaned = cleaned.replace(token, "").strip()
        # Truncate if model starts new instruction
        if "[INST]" in cleaned:
            cleaned = cleaned.split("[INST]")[0].strip()
    
    return cleaned


def load_model(model_name: str):
    """
    Load base model with LoRA adapter from HuggingFace.
    
    Args:
        model_name: Key from MODELS dictionary
    
    Returns:
        Tuple of (model, tokenizer, format_type)
    """
    config = MODELS[model_name]
    
    print(f"  Loading base model: {config['base_model']}")
    base_model = AutoModelForCausalLM.from_pretrained(
        config['base_model'],
        torch_dtype=DTYPE,
        device_map="auto" if DEVICE == "cuda" else None,
        trust_remote_code=True
    )
    
    if DEVICE == "mps":
        base_model = base_model.to(DEVICE)
    
    print(f"  Loading LoRA adapter: {HF_REPO}/{config['adapter_subfolder']}")
    model = PeftModel.from_pretrained(
        base_model,
        HF_REPO,
        subfolder=config['adapter_subfolder']
    )
    model.eval()
    
    print(f"  Loading tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained(
        config['base_model'],
        trust_remote_code=True
    )
    
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    return model, tokenizer, config['format']


def run_inference(model, tokenizer, medical_text: str, format_type: str, max_new_tokens: int = 768) -> str:
    """
    Run inference on a single medical text.
    
    Args:
        model: The loaded model with LoRA adapter
        tokenizer: The tokenizer
        medical_text: Input medical text to simplify
        format_type: 'chatml' or 'mistral'
        max_new_tokens: Maximum tokens to generate
    
    Returns:
        Simplified text
    """
    # Build prompt with correct format
    prompt = build_prompt(medical_text, format_type)
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    prompt_length = inputs["input_ids"].shape[1]
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    # Decode only the generated portion
    generated_text = tokenizer.decode(outputs[0][prompt_length:], skip_special_tokens=False)
    
    # Clean output
    simplified = clean_output(generated_text, format_type)
    
    return simplified


print("Utility functions defined.")
print("  - OpenBioLLM-8B uses ChatML format")
print("  - Mistral-7B and BioMistral-7B use Mistral format")

Utility functions defined.
  - OpenBioLLM-8B uses ChatML format
  - Mistral-7B and BioMistral-7B use Mistral format


## 5. Inference with OpenBioLLM-8B (Best Model) üèÜ

**Best overall performance:**
- ROUGE-L: **0.6749**
- SARI: **74.64**
- BERTScore: **0.9498**
- FK-Grade: 7.16
- Improvement: **+157.3%** over baseline

In [8]:
print("=" * 70)
print("Loading OpenBioLLM-8B with LoRA adapter...")
print("=" * 70)

model_openbio, tokenizer_openbio, format_openbio = load_model("OpenBioLLM-8B")

print("\n‚úÖ OpenBioLLM-8B loaded successfully!")

Loading OpenBioLLM-8B with LoRA adapter...
  Loading base model: aaditya/Llama3-OpenBioLLM-8B


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

  Loading LoRA adapter: GuyDor007/MediSimplifier-LoRA-Adapters/openbiollm_8b_lora


adapter_config.json: 0.00B [00:00, ?B/s]

openbiollm_8b_lora/adapter_model.safeten(‚Ä¶):   0%|          | 0.00/109M [00:00<?, ?B/s]

  Loading tokenizer...


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/449 [00:00<?, ?B/s]


‚úÖ OpenBioLLM-8B loaded successfully!


In [9]:
print("Generating simplified text with OpenBioLLM-8B...")
print("(This may take 1-2 minutes on CPU/MPS)\n")

output_openbio = run_inference(model_openbio, tokenizer_openbio, medical_text, format_openbio)

print("=" * 70)
print("OpenBioLLM-8B OUTPUT")
print("=" * 70)
print(output_openbio)

Generating simplified text with OpenBioLLM-8B...
(This may take 1-2 minutes on CPU/MPS)

OpenBioLLM-8B OUTPUT
Discharge Summary:

Patient Name: [REDACTED]
Medical Record Number: [REDACTED]

Hospital Course:
The patient came to our clinic with many small, raised bumps on the face and neck. These bumps had slowly grown in number and size over one year. The patient said he has HIV and his immune cell count is 82. The patient has been taking HIV medicine for five years. Doctors found the cause of the bumps by looking at skin samples under a microscope. The patient was diagnosed with a skin infection called molluscum contagiosum. The patient had not tried any other treatments before. The patient was given skin creams and pills, but the bumps did not get much better.

Discharge Diagnosis:
Diagnosis: Molluscum contagiosum (a common skin infection that causes small bumps)

Hospital Course Summary:
The patient was given a pill called isotretinoin at a dose based on body weight for one month. Th

In [10]:
# Free memory before loading next model
del model_openbio, tokenizer_openbio
if DEVICE == "cuda":
    torch.cuda.empty_cache()
print("Memory cleared.")

Memory cleared.


## 6. Inference with Mistral-7B (Best Readability)

**Best readability score:**
- ROUGE-L: 0.6491
- SARI: 73.79
- BERTScore: 0.9464
- FK-Grade: **6.91** (closest to target ‚â§6)
- Improvement: +65.9% over baseline

In [11]:
print("=" * 70)
print("Loading Mistral-7B with LoRA adapter...")
print("=" * 70)

model_mistral, tokenizer_mistral, format_mistral = load_model("Mistral-7B")

print("\n‚úÖ Mistral-7B loaded successfully!")

Loading Mistral-7B with LoRA adapter...
  Loading base model: mistralai/Mistral-7B-Instruct-v0.2


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

  Loading LoRA adapter: GuyDor007/MediSimplifier-LoRA-Adapters/mistral_7b_lora


adapter_config.json: 0.00B [00:00, ?B/s]

mistral_7b_lora/adapter_model.safetensor(‚Ä¶):   0%|          | 0.00/109M [00:00<?, ?B/s]

  Loading tokenizer...

‚úÖ Mistral-7B loaded successfully!


In [12]:
print("Generating simplified text with Mistral-7B...")
print("(This may take 1-2 minutes on CPU/MPS)\n")

output_mistral = run_inference(model_mistral, tokenizer_mistral, medical_text, format_mistral)

print("=" * 70)
print("Mistral-7B OUTPUT")
print("=" * 70)
print(output_mistral)

Generating simplified text with Mistral-7B...
(This may take 1-2 minutes on CPU/MPS)

Mistral-7B OUTPUT
Discharge Summary:

Patient Name: [REDACTED]
Medical Record Number: [REDACTED]

Hospital Course:
The patient came to our clinic with many skin-colored bumps on the face and neck. The bumps were round and raised. They slowly grew bigger and more bumps appeared over one year. The patient told us he has HIV. His immune cell count was 82 cells per cubic millimeter. The patient had been taking HIV medicine for five years. The doctors found out what was wrong because some bumps had a small white head inside. The patient was confirmed to have a skin infection called molluscum contagiosum. The patient had not tried skin creams or other medicines before. The patient was given skin creams and pills to try, but the bumps did not get better.

Discharge Diagnosis:
Diagnosis: Molluscum contagiosum (a skin infection that causes small bumps)

Hospital Course Summary:
The patient was given a pill cal

In [13]:
# Free memory before loading next model
del model_mistral, tokenizer_mistral
if DEVICE == "cuda":
    torch.cuda.empty_cache()
print("Memory cleared.")

Memory cleared.


## 7. Inference with BioMistral-7B (Medical Baseline)

**Medical domain baseline:**
- ROUGE-L: 0.6318
- SARI: 73.01
- BERTScore: 0.9439
- FK-Grade: 6.95
- Improvement: +53.3% over baseline

In [14]:
print("=" * 70)
print("Loading BioMistral-7B with LoRA adapter...")
print("=" * 70)

model_biomistral, tokenizer_biomistral, format_biomistral = load_model("BioMistral-7B")

print("\n‚úÖ BioMistral-7B loaded successfully!")

'(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: 2581d733-38aa-49e0-a679-c874437a9938)')' thrown while requesting HEAD https://huggingface.co/BioMistral/BioMistral-7B-DARE/resolve/main/config.json
Retrying in 1s [Retry 1/5].


Loading BioMistral-7B with LoRA adapter...
  Loading base model: BioMistral/BioMistral-7B-DARE


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  Loading LoRA adapter: GuyDor007/MediSimplifier-LoRA-Adapters/biomistral_7b_dare_lora


adapter_config.json: 0.00B [00:00, ?B/s]

biomistral_7b_dare_lora/adapter_model.sa(‚Ä¶):   0%|          | 0.00/109M [00:00<?, ?B/s]

  Loading tokenizer...


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]


‚úÖ BioMistral-7B loaded successfully!


In [15]:
print("Generating simplified text with BioMistral-7B...")
print("(This may take 1-2 minutes on CPU/MPS)\n")

output_biomistral = run_inference(model_biomistral, tokenizer_biomistral, medical_text, format_biomistral)

print("=" * 70)
print("BioMistral-7B OUTPUT")
print("=" * 70)
print(output_biomistral)

Generating simplified text with BioMistral-7B...
(This may take 1-2 minutes on CPU/MPS)

BioMistral-7B OUTPUT
Discharge Summary:

Patient Name: [REDACTED]
Medical Record Number: [REDACTED]

Hospital Course:
The patient came to our clinic with many small, skin-colored bumps on the face and neck. These bumps slowly grew in size and number over one year. The patient told us he has HIV. His immune cell count was 82, which is lower than normal. The patient has been taking HIV medicines for the past five years. The doctors found out what was causing the bumps. They saw small, round, raised spots in some of the bumps. The patient was told he has a common skin infection called molluscum contagiosum. The patient had tried many skin creams and pills before, but they did not help. The doctors gave the patient several skin creams and pills to use. The bumps did not get much better.

Discharge Diagnosis:
Diagnosis: Molluscum contagiosum (a common skin infection)

Hospital Course Summary:
The patien

In [16]:
# Free memory
del model_biomistral, tokenizer_biomistral
if DEVICE == "cuda":
    torch.cuda.empty_cache()
print("Memory cleared.")

Memory cleared.


## 8. Results Comparison

In [17]:
print("=" * 70)
print("GROUND TRUTH (Claude-generated reference)")
print("=" * 70)
print(ground_truth)

GROUND TRUTH (Claude-generated reference)
Discharge Summary:

Patient Name: [REDACTED]
Medical Record Number: [REDACTED]

Hospital Course:
The patient came to our clinic with skin-colored, round, raised bumps on the face and neck. These bumps had been growing in size and number over one year. The patient shared that he has HIV, and his immune cell count was very low at 82 cells per cubic millimeter. The patient had been taking HIV medication for the last five years. The doctor diagnosed the condition by looking at the bumps, which had a typical appearance. The patient was confirmed to have molluscum contagiosum, a viral skin infection. The patient had not taken any pills for this skin problem before. The patient was given creams and pills, but the bumps did not get much better.

Discharge Diagnosis:
Diagnosis: Molluscum contagiosum, a viral skin infection

Hospital Course Summary:
The patient was given a pill called isotretinoin at a dose based on body weight for one month. This treatm

In [18]:
# Summary statistics
print("\n" + "=" * 70)
print("LENGTH COMPARISON")
print("=" * 70)
print(f"{'Source':<25} {'Characters':>12} {'Ratio':>10}")
print("-" * 50)
print(f"{'Original Input':<25} {len(medical_text):>12,} {1.00:>10.2f}")
print(f"{'Ground Truth':<25} {len(ground_truth):>12,} {len(ground_truth)/len(medical_text):>10.2f}")
print(f"{'OpenBioLLM-8B':<25} {len(output_openbio):>12,} {len(output_openbio)/len(medical_text):>10.2f}")
print(f"{'Mistral-7B':<25} {len(output_mistral):>12,} {len(output_mistral)/len(medical_text):>10.2f}")
print(f"{'BioMistral-7B':<25} {len(output_biomistral):>12,} {len(output_biomistral)/len(medical_text):>10.2f}")


LENGTH COMPARISON
Source                      Characters      Ratio
--------------------------------------------------
Original Input                   2,609       1.00
Ground Truth                     2,377       0.91
OpenBioLLM-8B                    2,229       0.85
Mistral-7B                       2,475       0.95
BioMistral-7B                    2,460       0.94


## 9. Model Performance Summary

Results from full test set evaluation (1,001 samples):

In [19]:
# Performance metrics from full evaluation
print("=" * 80)
print("FULL TEST SET PERFORMANCE (1,001 samples)")
print("=" * 80)
print(f"{'Model':<20} {'ROUGE-L':>10} {'SARI':>10} {'BERTScore':>12} {'FK-Grade':>10} {'Œî vs Base':>12}")
print("-" * 80)
print(f"{'OpenBioLLM-8B üèÜ':<20} {'0.6749':>10} {'74.64':>10} {'0.9498':>12} {'7.16':>10} {'+157.3%':>12}")
print(f"{'Mistral-7B':<20} {'0.6491':>10} {'73.79':>10} {'0.9464':>12} {'6.91':>10} {'+65.9%':>12}")
print(f"{'BioMistral-7B':<20} {'0.6318':>10} {'73.01':>10} {'0.9439':>12} {'6.95':>10} {'+53.3%':>12}")
print("-" * 80)
print(f"{'Ground Truth FK:':<20} {'7.23':>10}")
print(f"{'Source FK:':<20} {'14.50':>10}")
print(f"{'Target FK:':<20} {'‚â§6.0':>10}")
print("\nüìä All models achieve ~50% readability reduction (college ‚Üí 7th grade level)")

FULL TEST SET PERFORMANCE (1,001 samples)
Model                   ROUGE-L       SARI    BERTScore   FK-Grade    Œî vs Base
--------------------------------------------------------------------------------
OpenBioLLM-8B üèÜ          0.6749      74.64       0.9498       7.16      +157.3%
Mistral-7B               0.6491      73.79       0.9464       6.91       +65.9%
BioMistral-7B            0.6318      73.01       0.9439       6.95       +53.3%
--------------------------------------------------------------------------------
Ground Truth FK:           7.23
Source FK:                14.50
Target FK:                 ‚â§6.0

üìä All models achieve ~50% readability reduction (college ‚Üí 7th grade level)


## 10. Key Findings

### Ranking Reversal
The worst zero-shot model (OpenBioLLM) achieved the **best** fine-tuned performance:

| Model | Zero-Shot ROUGE-L | Fine-Tuned ROUGE-L | Improvement |
|-------|-------------------|--------------------|--------------|
| OpenBioLLM-8B | 0.2623 (worst) | **0.6749** (best) | +157% |
| Mistral-7B | 0.3912 | 0.6491 | +66% |
| BioMistral-7B | 0.4120 (best) | 0.6318 (worst) | +53% |

### Statistical Significance
- All pairwise ROUGE-L differences are significant (p < 0.001)
- Effect size: OpenBioLLM vs BioMistral = medium (Cohen's d = 0.79)

## 11. Resources

### HuggingFace
- **Models:** https://huggingface.co/GuyDor007/MediSimplifier-LoRA-Adapters
- **Dataset:** https://huggingface.co/datasets/GuyDor007/medisimplifier-dataset

### GitHub
- **Code & Notebooks:** https://github.com/gd007/MediSimplifier

### Prompt Formats
Models were trained with their **native formats**:

**ChatML (OpenBioLLM-8B):**
```
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{instruction}

{input}<|im_end|>
<|im_start|>assistant
```

**Mistral (Mistral-7B, BioMistral-7B):**
```
[INST] <<SYS>>
{system_message}
<</SYS>>

{instruction}

{input} [/INST]
```

### Citation
```bibtex
@misc{medisimplifier2026,
  title={MediSimplifier: Medical Discharge Summary Simplification using LoRA Fine-Tuning},
  author={Dor, Guy and Avraham, Shmulik},
  year={2026},
  institution={Technion - Israel Institute of Technology}
}
```

---
**End of Notebook**