Skip to content

gd007/MediSimplifier

Repository files navigation

MediSimplifier

LoRA Fine-Tuning for Medical Discharge Summary Simplification

Hugging Face Models Hugging Face Dataset License

Overview

MediSimplifier fine-tunes open-source LLMs using LoRA to simplify medical discharge summaries to a 6th-grade reading level, improving patient comprehension of their medical documents.

Course Project: Technion DS25 Deep Learning
Authors: Guy Dor, Shmulik Avraham


🚀 Getting Started

Try the Models Now

👉 Open notebooks/MediSimplifier_Inference_Demo.ipynb to get started!

The demo notebook provides everything you need:

  • ✅ Load all three fine-tuned models from HuggingFace
  • ✅ Correct prompt formats for each model architecture
  • ✅ Run inference on medical discharge summaries
  • ✅ Compare outputs across models

Installation

git clone https://github.com/gd007/MediSimplifier.git
cd MediSimplifier
pip install -r requirements.txt

Then open the inference demo notebook and run the cells.


Key Results

Model ROUGE-L SARI BERTScore FK-Grade Improvement
OpenBioLLM-8B 🏆 0.6749 74.64 0.9498 7.16 +157.3%
Mistral-7B 0.6491 73.79 0.9464 6.91 +65.9%
BioMistral-7B-DARE 0.6318 73.01 0.9439 6.95 +53.3%

Achievement: ~50% readability reduction (FK 14.5 → ~7.0)

Project Structure

├── notebooks/
│   ├── MediSimplifier_Inference_Demo.ipynb  # 👈 START HERE
│   ├── MediSimplifier_Part1.ipynb           # Data prep & ground truth generation
│   ├── MediSimplifier_Part2.ipynb           # Baseline evaluation
│   ├── MediSimplifier_Part3.ipynb           # LoRA fine-tuning & ablation
│   └── MediSimplifier_Part4.ipynb           # Evaluation & analysis
├── results/
│   ├── ablation/                            # Ablation study results
│   ├── baseline/                            # Zero-shot baseline metrics
│   ├── evaluation/                          # Final evaluation metrics
│   ├── training/                            # Training logs
│   └── figures/                             # All visualizations
├── MediSimplifier_IEEE_Paper.pdf            # 📄 Final report
├── MediSimplifier_Final_Presentation.pdf    # 📊 Presentation
└── MediSimplifier_Master_Document.md

Methodology

Dataset

Models Compared

Model Type Architecture Prompt Format
OpenBioLLM-8B Medical Llama3 ChatML
BioMistral-7B-DARE Medical Mistral Mistral
Mistral-7B-Instruct-v0.2 General Mistral Mistral

LoRA Configuration (Optimal)

Parameter Value
Rank (r) 32
Alpha (α) 64
Target Modules q, k, v, o projections
rsLoRA True
Trainable Params 27.3M (0.38%)

Ablation Study Findings

Phase Finding
Rank r=32 optimal (contradicts Hu et al. 2021)
Modules all_attn best despite 2x parameters
Data Size More data = better (+5.5-6.6% ROUGE-L)
rsLoRA Adopted based on literature

Key Research Findings

  1. Ranking Reversal: Worst zero-shot model (OpenBioLLM) achieved best fine-tuned performance (+157%)
  2. Medical Pretraining: Advantage disappears after task-specific fine-tuning
  3. Consistent Success: All models achieve ~50% readability reduction
  4. Statistical Significance: All pairwise ROUGE-L differences significant (p < 0.001)

Resources

Resource Link
🚀 Inference Demo MediSimplifier_Inference_Demo.ipynb
🤗 Models MediSimplifier-LoRA-Adapters
🤗 Dataset medisimplifier-dataset
📄 Paper MediSimplifier_IEEE_Paper.pdf
📊 Presentation MediSimplifier_Final_Presentation.pdf

Citation

@misc{medisimplifier2026,
  author = {Dor, Guy and Avraham, Shmulik},
  title = {MediSimplifier: LoRA Fine-Tuning for Medical Discharge Summary Simplification},
  year = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/gd007/MediSimplifier}}
}

License

Apache 2.0 - See LICENSE for details.

Acknowledgments

  • Technion DS25 Deep Learning Course
  • Base model teams: OpenBioLLM, Mistral, BioMistral
  • Asclepius dataset creators (starmpcc)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published