Subscribe us: https://groups.google.com/u/2/g/bodymaps
We introduce TextoMorph, a novel text-driven approach that allows textual control over tumor features such as texture, heterogeneity, boundaries, and pathology type to improve AI training. The text is extracted from medical reports (e.g., radiology/pathology reports).
Text-Driven Tumor Synthesis
Xinran Li1,2, Yi Shuai3,4, Chen Liu1,5, Qi Chen1, Tianyu Lin1, Pengfei Guo6, Dong Yang6, Can Zhao6, Pedro R. A. S. Bassi1,7,8, Daguang Xu6, Kang Wang9, Yang Yang9, Alan Yuille1, Zongwei Zhou1,*
1 Johns Hopkins University
2 Yale University 3 The First Affiliated Hospital of Sun Yat-sen University
4 Sun Yat-sen University 5 Hong Kong Polytechnic University
6 NVIDIA 7 University of Bologna 8 Italian Institute of Technology 9 University of California, San Francisco
IEEE TMI 2026
We collect paper related to medical data synthesis in Awesome Synthetic Tumors
git clone https://github.com/MrGiovanni/TextoMorph.git
cd TextoMorphSee installation to obtain requirements and download dataset.
Note
TL;DR: We provide pre-trained Diffusion Models for liver, pancreas, and kidney, which can be directly used for tumor synthesis. Download them to Segmentation/TumorGeneration/model_weight
| Tumor | Download |
|---|---|
| Liver | Download |
| Pancreas | Download |
| Kidney | Download |
Our Autoencoder Model was trained on the AbdomenAtlas 1.1 dataset. This checkpoint can be directly used for Diffusion Model. Download it to Diffusion/pretrained_models/:
cd Diffusion/pretrained_models
wget https://huggingface.co/MrGiovanni/DiffTumor/resolve/main/AutoencoderModel/AutoencoderModel.ckpt
cd ../..Due to licensing constraints, we are unable to provide the training CT datasets. However, to assist in training your own model, we have made the descriptive words used during training available in the following files:
If you wish to train your own model, you can rewrite these real_tumor.txt files using the following format:
CT_id Label_id t1 t2 ... t100
cd Diffusion/
vqgan_ckpt="pretrained_models/AutoencoderModel.ckpt" # your-datapath
datapath="/ccvl/net/ccvl15/xinran/CT/" # your-datapath
tumorlabel="/ccvl/net/ccvl15/xinran/Tumor/liver/" # your-datapath
python train.py dataset.name=liver_tumor dataset.data_root_path=$datapath dataset.label_root_path=$tumorlabel dataset.dataset_list=['liver'] dataset.uniform_sample=False model.results_folder_postfix="liver" model.vqgan_ckpt=$vqgan_ckptNote
TL;DR: We provide Segmentation Models for liver, pancreas, and kidney, which can be directly used for tumor segmentation. Download all checkpoints
Liver Tumor Segmentation
| Method | Tumor Size (mm) d < 20 | 20 ≤ d < 50 | d ≥ 50 | DSC (%) | NSD (%) |
|---|---|---|---|---|---|
| RealTumor (checkpoint) |
64.5 (20/31) | 69.7 (53/76) | 66.7 (38/57) | 59.1 ± 30.4 | 60.1 ± 30.0 |
| SynTumor (paper, checkpoint) |
71.0 (22/31) | 69.7 (53/76) | 73.7 (42/57) | 62.3 ± 12.7 | 87.7 ± 21.4 |
| Pixel2Cancer (paper, checkpoint) |
- | - | - | 57.2 ± 21.3 | 63.1 ± 15.6 |
| DiffTumor (paper, checkpoint) |
77.4 (24/31) | 75.0 (57/76) | 73.7 (42/57) | 64.2 ± 33.3 | 66.1 ± 32.8 |
| TextoMorph (1) (checkpoint) |
75.4 (23/31) | 72.4 (55/76) | 74.6 (43/57) | 65.5 ± 25.0 | 61.3 ± 28.6 |
| TextoMorph (2) (checkpoint) |
77.4 (24/31) | 75.0 (57/76) | 77.2 (44/57) | 68.4 ± 30.4 | 69.2 ± 31.0 |
| TextoMorph (3) (checkpoint) |
80.6 (25/31) | 77.6 (59/76) | 80.7 (46/57) | 69.7 ± 27.2 | 70.8 ± 26.0 |
| TextoMorph (4) (checkpoint) |
83.9 (26/31) | 77.6 (59/76) | 87.7 (50/57) | 71.6 ± 27.2 | 72.4 ± 30.3 |
Pancreas Tumor Segmentation
| Method | Tumor Size (mm) d < 20 | 20 ≤ d < 50 | d ≥ 50 | DSC (%) | NSD (%) |
|---|---|---|---|---|---|
| RealTumor (checkpoint) |
58.3 (14/24) | 67.7 (21/31) | 57.1 (4/7) | 53.3 ± 28.7 | 40.1 ± 28.8 |
| SynTumor (paper, checkpoint) |
62.5 (15/24) | 64.5 (20/31) | 57.1 (4/7) | 54.0 ± 31.4 | 47.2 ± 23.0 |
| Pixel2Cancer (paper, checkpoint) |
- | - | - | 57.9 ± 13.7 | 54.3 ± 19.2 |
| DiffTumor (paper, checkpoint) |
66.7 (16/24) | 67.7 (21/31) | 57.1 (4/7) | 58.9 ± 42.8 | 52.8 ± 26.2 |
| TextoMorph (1) (checkpoint) |
66.7 (16/24) | 64.5 (20/31) | 57.1 (4/7) | 55.8 ± 32.6 | 51.1 ± 35.6 |
| TextoMorph (2) (checkpoint) |
70.8 (17/24) | 61.3 (19/31) | 57.1 (4/7) | 59.7 ± 36.1 | 60.6 ± 38.3 |
| TextoMorph (3) (checkpoint) |
64.0 (16/24) | 70.0 (21/31) | 57.1 (4/7) | 60.2 ± 27.3 | 71.0 ± 31.5 |
| TextoMorph (4) (checkpoint) |
87.5 (21/24) | 87.1 (27/31) | 85.7 (6/7) | 67.3 ± 24.8 | 65.5 ± 27.1 |
Kidney Tumor Segmentation
| Method | Tumor Size (mm) d < 20 | 20 ≤ d < 50 | d ≥ 50 | DSC (%) | NSD (%) |
|---|---|---|---|---|---|
| RealTumor (checkpoint) |
71.4 (5/7) | 66.7 (4/6) | 69.0 (29/42) | 78.0 ± 14.9 | 65.8 ± 17.7 |
| SynTumor (paper, checkpoint) |
71.4 (5/7) | 66.7 (4/6) | 69.0 (29/42) | 78.1 ± 23.0 | 66.0 ± 21.2 |
| Pixel2Cancer (paper, checkpoint) |
- | - | - | 71.5 ± 21.4 | 64.3 ± 16.9 |
| DiffTumor (paper, checkpoint) |
71.4 (5/7) | 83.3 (5/6) | 69.0 (29/42) | 78.9 ± 19.7 | 69.2 ± 18.5 |
| TextoMorph (1) (checkpoint) |
57.1 (4/7) | 83.3 (5/6) | 69.0 (29/42) | 79.2 ± 22.3 | 71.4 ± 21.4 |
| TextoMorph (2) (checkpoint) |
71.4 (5/7) | 83.3 (5/6) | 76.2 (32/42) | 80.6 ± 21.8 | 76.8 ± 19.3 |
| TextoMorph (3) (checkpoint) |
71.4 (5/7) | 83.3 (5/6) | 73.8 (31/42) | 79.7 ± 20.2 | 75.2 ± 21.5 |
| TextoMorph (4) (checkpoint) |
71.4 (5/7) | 83.3 (5/6) | 76.2 (32/42) | 85.2 ± 9.7 | 78.4 ± 13.9 |
Note
- TextoMorph (1): Without Text Extraction and Generation, Text-Driven Contrastive Learning, and Targeted Data Augmentation.
- TextoMorph (2): With Text Extraction and Generation only.
- TextoMorph (3): With Text Extraction and Generation and Text-Driven Contrastive Learning.
- TextoMorph (4): With Text Extraction and Generation, Text-Driven Contrastive Learning, and Targeted Data Augmentation.
cd Segmentation/TumorGeneration/model_weight
wget https://huggingface.co/MrGiovanni/DiffTumor/resolve/main/AutoencoderModel/AutoencoderModel.ckpt
wget https://huggingface.co/Alena-Xinran/DescriptiveTumor/resolve/main/descriptivetumor2/liver.pt
wget https://huggingface.co/Alena-Xinran/DescriptiveTumor/resolve/main/descriptivetumor2/pancreas.pt
wget https://huggingface.co/Alena-Xinran/DescriptiveTumor/resolve/main/descriptivetumor2/kidney.pt
cd ../..cd Segmentation
healthy_datapath="/ccvl/net/ccvl15/xinran/" # your-datapath
datapath="/ccvl/net/ccvl15/xinran/" # your-datapath
cache_rate=1.0
batch_size=12
val_every=50
workers=12
organ=liver
fold=0
backbone=unet
logdir="runs/$organ.fold$fold.$backbone"
datafold_dir=cross_eval/"$organ"_aug_data_fold/
dist=$((RANDOM % 99999 + 10000))
python -W ignore main.py --model_name $backbone --cache_rate $cache_rate --dist-url=tcp://127.0.0.1:$dist --workers $workers --max_epochs 2000 --val_every $val_every --batch_size=$batch_size --save_checkpoint --distributed --noamp --organ_type $organ --organ_model $organ --tumor_type tumor --fold $fold --ddim_ts 50 --logdir=$logdir --healthy_data_root $healthy_datapath --data_root $datapath --datafold_dir $datafold_dircd Segmentation
datapath="/ccvl/net/ccvl15/xinran/" #your-datapath
organ=liver
fold=0
datafold_dir=cross_eval/"$organ"_aug_data_fold/
python -W ignore validation.py --model=unet --data_root $datapath --datafold_dir $datafold_dir --tumor_type tumor --organ_type $organ --fold $fold --log_dir $organ/$organ.fold$fold.unet --save_dir out/$organ/$organ.fold$fold.unet@article{li2024text,
title={Text-Driven Tumor Synthesis},
author={Li, Xinran and Shuai, Yi and Liu, Chen and Chen, Qi and Wu, Qilong and Guo, Pengfei and Yang, Dong and Zhao, Can and Bassi, Pedro RAS and Xu, Daguang and others},
journal={arXiv preprint arXiv:2412.18589},
year={2024},
url={https://github.com/MrGiovanni/TextoMorph}
}
@article{chen2024analyzing,
title={Analyzing tumors by synthesis},
author={Chen, Qi and Lai, Yuxiang and Chen, Xiaoxi and Hu, Qixin and Yuille, Alan and Zhou, Zongwei},
journal={Generative Machine Learning Models in Medical Image Computing},
pages={85--110},
year={2024},
publisher={Springer}
}
@inproceedings{chen2024towards,
title={Towards Generalizable Tumor Synthesis},
author={Chen, Qi and Chen, Xiaoxi and Song, Haorui and Xiong, Zhiwei and Yuille, Alan and Wei, Chen and Zhou, Zongwei},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024},
url={https://github.com/MrGiovanni/DiffTumor}
}
This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research and the Patrick J. McGovern Foundation Award.
