Translating Expert Radiology Reports into Layman Summaries using pre-trained Flan-T5 Model. Task #13 #3

piradiusquared · 2025-10-14T06:57:32Z

Expert to Layman Radiology Reports

Overview:

This project aims to fine-tune an existing encoder-decoder LLM - the Flan-T5 in specifically summarising data from the BioLaySumm 2025 dataset.
Fine-tuning was performed on the rangpur cluster, with full training on the NVDIA A100 GPU.

Files:

constants.py - File containing all parameters and other constant variables
dataset.py - Custom dataset loader using Pandas for model training and evaluation
modules.py - Importer for the pre-trained Flan-T5 model
predict.py - Model benchmarking on unseen validation data split
requirements.txt - List of all dependencies and their respective version
train.py - Custom training and evaluation loop on the train data split
assets/ - Folder for all README required images
runners/ - Folder containing train and benchmark scripts for use on rangpur

Result of Model Fine-tuning:

rouge scores:

rouge1: 71.19
rouge2: 51.83
rougeL: 65.80
rougeLsum: 65.82

perplexity scores (extra metrics):

On average, the fine-tuned model is 2 - 4 times more confident in selecting the next appropriate token (word) for summaries.
Average perplexity score is consistently < 3

… fine tuning

…for argument purposes.

…gFace without their API. Still under testing.

…r now)

…values for Seq2Seq training. Optimiser used is AdamW, as used before in demos.

…s is sampled from total "train" dataset

…for training and validation respectively

… base later)

…ing loop.

… training loop from pytorch.org and huggingface tutorials.

… most basic implementation. Rouge scores are printed at the end of each epoch.

…nt of rows (not random, starting from 0)

…fficial layman report in the dataset to fine tuned model.

…uses the HuggingFace Dataset API for convenience (might change to custom dataset later)

Implemented all sections except for benchmark examples. Minor improvements and grammatical checks required.

…for real training.

Set default benchmark to 5 samples from validation split.

…ly or via rangpur.

Added note about training completion and hardware requirements.

Corrected 'remove' to 'remote' in training note.

Added constants for dataset links, training parameters, and model prompt.

Added docstrings to methods for better documentation.

Added docstrings to FlanModel methods for clarity.

Added a comment to clarify the perplexity calculation.

Added detailed docstrings for training and evaluation methods, improved comments for clarity, and included optimizer setup in the training script.

Claire1217 · 2025-11-20T09:13:31Z

Recognition Problem : total : 20
Solves problem: Good data augmentation design. Very good reasoning from test run 1 to test run 2. (5)
Implementation functions : Well-organized codes. Good practice to have your constants in an individual file(3)
Good design: good (1)
Commenting: Well commented (1)
Difficulty: Hard (10)

wangzhaomxy · 2025-11-24T06:29:27Z

s4885380

gayanku · 2025-11-24T11:43:06Z

Marking

Good/OK/Fair Practice (Design/Commenting, TF/Torch Usage)
	Good design and implementation.
	Spacing and comments.
	No Header blocks.	-1
Recognition Problem
	Good solution to problem.
	Driver Script present.
	File structure present.
	Good Usage & Demo & Visualisation & Data usage.
	Module present.
	Commenting present.
	No Data leakage found.
	Difficulty : Hard. Hard Difficulty : LLM
Commit Log
	Good Meaningful commit messages.
	Good Progressive commits.
Documentation
	Readme :Good.
	Model/technical explanation :Good.
	Description and Comments :Good.
	Markdown used and PDF submitted.
Pull Request
	Successful Pull Request (Working Algorithm Delivered on Time in Correct Branch).
	No Feedback required.
	Request Description is good.
TOTAL		-1

Marked as per the due date and changes after which aren't necessarily allowed to contribute to grade for fairness.
Subject to approval from Shakes

piradiusquared added 12 commits October 2, 2025 08:25

Setup folder configuration. Add README

bc903de

Renamed topic folder. Added skeleton files

a495b75

Added basic imports to load dataset from HuggingFace.

6d91d17

Some basic imports

bb8a7cd

Added dataset loading. Columns are as expected (in modules for now)

3e09217

Updated requirements. Found additional library for t5 tokenizer

fa8bed2

Test preprocessor function. Added additional requirements for t5 models

85d233c

Preprocessing works without error for now.

5aea675

Import basic LoRA adapter from HuggingFace. Runs in rangpur

b00fbb2

Basic README skeleton. Created folder for model diagrams.

4fc6f5d

Reformatting of current code for easier refactoring. Loaded model for…

df063ca

… fine tuning

Update to README. Some details filled out

b0649dd

piradiusquared changed the title ~~Initial Pull Request~~ FLAN-T5 Laymann Oct 16, 2025

piradiusquared changed the title ~~FLAN-T5 Laymann~~ FLAN-T5 Layman Oct 16, 2025

piradiusquared added 14 commits October 23, 2025 22:36

Created helper file constants.py. Contains all values which are used …

4dc7fe8

…for argument purposes.

Dataset function. Uses Pandas to load in dataset directly from Huggin…

f1ca52e

…gFace without their API. Still under testing.

Removed HuggingFace API related code. Re-visiting modules (dataset fo…

7787f51

…r now)

Load LoRA into modules.py. Test with some optimal LoRA configuration …

a966c5f

…values for Seq2Seq training. Optimiser used is AdamW, as used before in demos.

Minor updates to dataset configurations. For testing purposes, 50 row…

10ffdd8

…s is sampled from total "train" dataset

Added held-out dataset splitting. Dataset is split into 70/30 ratios …

c6377a6

…for training and validation respectively

Update to constant values. Testing with t5-small (will revert back to…

5bcbbad

… base later)

Skeleton for train.py. Should be the required variables for the train…

c8d6e75

…ing loop.

Added scaler and rouge evaluator into train.py. Adapted basic pytorch…

3949559

… training loop from pytorch.org and huggingface tutorials.

Adapted evaluation loop from Pytorch and online videos. Currently the…

0524281

… most basic implementation. Rouge scores are printed at the end of each epoch.

Added sampling feature to dataset loading. Can now get specified amou…

e5f754d

…nt of rows (not random, starting from 0)

Added gitignore for pycache files

b64fba8

Prediction of unseen data from validation set. Compares scores from o…

20769c6

…fficial layman report in the dataset to fine tuned model.

Training loop now prints out live stats after each epoch when training.

36682f8

This was referenced Oct 30, 2025

COMP3710 Project 3: Improved 2D U-Net for HipMRI Prostate Segmentation #64

Closed

Project 3: Improved 2D UNet for HipMRI Prostate Segmentation (Normal Difficulty) - s4884308 #88

Open

piradiusquared added 16 commits October 30, 2025 19:21

Update requirements to what is used in rangpur

0e6c2b8

Minor refactoring to code

ad17bbe

Added additioanl Perplexity scoring in model benchmarking. This code …

11c9177

…uses the HuggingFace Dataset API for convenience (might change to custom dataset later)

Loss image plots

b96beb3

Replaced loss3 with updated graph (with data augmentation)

06e50ed

Transformer architecture.

3b9595f

Update constants to use t5-base instead of small.

abce49c

Checkpoint for README.

e0126dc

Implemented all sections except for benchmark examples. Minor improvements and grammatical checks required.

Added rangpur cluster run files.

81445f6

Minor change to sample size of training data. Removed call to subset …

c29259b

…for real training.

Removed some hard coded constants with those in constants.py.

d9c7f4b

Set default benchmark to 5 samples from validation split.

Updates to README. Added section for running training/benchmark local…

5bc31f8

…ly or via rangpur.

Update to constants for saving the loss plot.

f9bd91f

Minor typo fixes. Added some explanations to improvements.

0d32c89

Minor update to README indicating where model was trained.

7fd0ae8

Added note about training completion and hardware requirements.

Fix typo in README regarding SSH access

be16c21

Corrected 'remove' to 'remote' in training note.

piradiusquared changed the title ~~FLAN-T5 Layman~~ Translating Expert Radiology Reports into Layman Summaries using pre-trained Flan-T5 Model. Task #13 Oct 31, 2025

piradiusquared added 5 commits October 31, 2025 13:44

Add constants for model and training configuration

cf5d01c

Added constants for dataset links, training parameters, and model prompt.

Enhance documentation with method docstrings

e43907a

Added docstrings to methods for better documentation.

Enhance FlanModel with method docstrings

4d3e2b6

Added docstrings to FlanModel methods for clarity.

Clarify perplexity calculation in predict.py

4a8db21

Added a comment to clarify the perplexity calculation.

Enhance documentation and setup in train.py

cb3f542

Added detailed docstrings for training and evaluation methods, improved comments for clarity, and included optimizer setup in the training script.

ChiaJouLu mentioned this pull request Nov 2, 2025

2D Prostate Segmentation with Improved UNet - Student 47222610 #272

Open

hanemma7moud added _Hard _LLM T5, FLAN-T5, GPT-2 labels Nov 11, 2025

wangzhaomxy added the Uploaded_PDF label Nov 24, 2025

gayanku added the SECOND_MARK label Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Translating Expert Radiology Reports into Layman Summaries using pre-trained Flan-T5 Model. Task #13 #3

Translating Expert Radiology Reports into Layman Summaries using pre-trained Flan-T5 Model. Task #13 #3

Uh oh!

piradiusquared commented Oct 14, 2025 •

edited

Loading

Uh oh!

Claire1217 commented Nov 20, 2025

Uh oh!

wangzhaomxy commented Nov 24, 2025

Uh oh!

gayanku commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Translating Expert Radiology Reports into Layman Summaries using pre-trained Flan-T5 Model. Task #13 #3

Are you sure you want to change the base?

Translating Expert Radiology Reports into Layman Summaries using pre-trained Flan-T5 Model. Task #13 #3

Uh oh!

Conversation

piradiusquared commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expert to Layman Radiology Reports

Overview:

Files:

Result of Model Fine-tuning:

Uh oh!

Claire1217 commented Nov 20, 2025

Uh oh!

wangzhaomxy commented Nov 24, 2025

Uh oh!

gayanku commented Nov 24, 2025

Marking

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

piradiusquared commented Oct 14, 2025 •

edited

Loading