LoRA Fine-Tuning for Medical Discharge Summary Simplification
MediSimplifier fine-tunes open-source LLMs using LoRA to simplify medical discharge summaries to a 6th-grade reading level, improving patient comprehension of their medical documents.
Course Project: Technion DS25 Deep Learning
Authors: Guy Dor, Shmulik Avraham
👉 Open notebooks/MediSimplifier_Inference_Demo.ipynb to get started!
The demo notebook provides everything you need:
- ✅ Load all three fine-tuned models from HuggingFace
- ✅ Correct prompt formats for each model architecture
- ✅ Run inference on medical discharge summaries
- ✅ Compare outputs across models
git clone https://github.com/gd007/MediSimplifier.git
cd MediSimplifier
pip install -r requirements.txtThen open the inference demo notebook and run the cells.
| Model | ROUGE-L | SARI | BERTScore | FK-Grade | Improvement |
|---|---|---|---|---|---|
| OpenBioLLM-8B 🏆 | 0.6749 | 74.64 | 0.9498 | 7.16 | +157.3% |
| Mistral-7B | 0.6491 | 73.79 | 0.9464 | 6.91 | +65.9% |
| BioMistral-7B-DARE | 0.6318 | 73.01 | 0.9439 | 6.95 | +53.3% |
Achievement: ~50% readability reduction (FK 14.5 → ~7.0)
├── notebooks/
│ ├── MediSimplifier_Inference_Demo.ipynb # 👈 START HERE
│ ├── MediSimplifier_Part1.ipynb # Data prep & ground truth generation
│ ├── MediSimplifier_Part2.ipynb # Baseline evaluation
│ ├── MediSimplifier_Part3.ipynb # LoRA fine-tuning & ablation
│ └── MediSimplifier_Part4.ipynb # Evaluation & analysis
├── results/
│ ├── ablation/ # Ablation study results
│ ├── baseline/ # Zero-shot baseline metrics
│ ├── evaluation/ # Final evaluation metrics
│ ├── training/ # Training logs
│ └── figures/ # All visualizations
├── MediSimplifier_IEEE_Paper.pdf # 📄 Final report
├── MediSimplifier_Final_Presentation.pdf # 📊 Presentation
└── MediSimplifier_Master_Document.md
- Source: Asclepius-Synthetic-Clinical-Notes (10K samples)
- Ground Truth: Generated using Claude Opus 4.5
- Splits: Train (7,999) / Val (999) / Test (1,001)
| Model | Type | Architecture | Prompt Format |
|---|---|---|---|
| OpenBioLLM-8B | Medical | Llama3 | ChatML |
| BioMistral-7B-DARE | Medical | Mistral | Mistral |
| Mistral-7B-Instruct-v0.2 | General | Mistral | Mistral |
| Parameter | Value |
|---|---|
| Rank (r) | 32 |
| Alpha (α) | 64 |
| Target Modules | q, k, v, o projections |
| rsLoRA | True |
| Trainable Params | 27.3M (0.38%) |
| Phase | Finding |
|---|---|
| Rank | r=32 optimal (contradicts Hu et al. 2021) |
| Modules | all_attn best despite 2x parameters |
| Data Size | More data = better (+5.5-6.6% ROUGE-L) |
| rsLoRA | Adopted based on literature |
- Ranking Reversal: Worst zero-shot model (OpenBioLLM) achieved best fine-tuned performance (+157%)
- Medical Pretraining: Advantage disappears after task-specific fine-tuning
- Consistent Success: All models achieve ~50% readability reduction
- Statistical Significance: All pairwise ROUGE-L differences significant (p < 0.001)
| Resource | Link |
|---|---|
| 🚀 Inference Demo | MediSimplifier_Inference_Demo.ipynb |
| 🤗 Models | MediSimplifier-LoRA-Adapters |
| 🤗 Dataset | medisimplifier-dataset |
| 📄 Paper | MediSimplifier_IEEE_Paper.pdf |
| 📊 Presentation | MediSimplifier_Final_Presentation.pdf |
@misc{medisimplifier2026,
author = {Dor, Guy and Avraham, Shmulik},
title = {MediSimplifier: LoRA Fine-Tuning for Medical Discharge Summary Simplification},
year = {2026},
publisher = {GitHub},
howpublished = {\url{https://github.com/gd007/MediSimplifier}}
}Apache 2.0 - See LICENSE for details.
- Technion DS25 Deep Learning Course
- Base model teams: OpenBioLLM, Mistral, BioMistral
- Asclepius dataset creators (starmpcc)