The integration of medical imaging and clinical text has enabled the emergence of generalist artificial intelligence (AI) systems for healthcare. However, pervasive biases, such as imbalanced disease prevalence, skewed anatomical region distributions, heterogeneous imaging protocols, and demographic disparities, pose significant challenges to the fairness and reliability of vision-language systems in real-world clinical settings. Here we present BiasCareVL, a bias-aware multimodal learning framework that introduces bias control directly into model design, rather than treating it as a post hoc correction.
BiasCareVL incorporates adaptive uncertainty modeling with optional human-in-the-loop refinement to regulate the influence of dominant data patterns and to promote equitable reasoning under distributional imbalance. Trained on 3.44 million samples spanning over 15 imaging modalities, the framework supports diverse clinical tasks, including visual question answering, disease classification, segmentation, and report generation within a unified representation space. Across eight public benchmarks covering dermatology, oncology, radiology, and pathology, BiasCareVL consistently outperforms 20 state-of-the-art methods, with pronounced gains in clinically challenging scenarios, including over 10% accuracy improvement in multi-class skin lesion diagnosis and more than 20% Dice improvement in small tumor segmentation. Furthermore, BiasCareVL achieves diagnostic performance exceeding human accuracy with substantially reduced time requirements when evaluated with board-certified radiologists.
git clone https://github.com/lich0031/BiasCareVL
cd BiasCareVL
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtInstallation is very fast, taking less than half an hour.
- Linux is recommended.
- Python
3.10is recommended. - NVIDIA GPU(s) with CUDA support are recommended for practical use.
- Main dependencies are listed in
requirements.txt,
Before running the code, update local paths as needed. In particular, check:
--version--vision_pretrained--dataset_dir
Released checkpoints are available at:
BiasCareVL weights: Google Drive
The codebase references the following public datasets:
PubMedVision: LinkIMed-361M: LinkMIMIC-CXR-JPG v2.0.0: LinkPMC-VQA: LinkVQA-RAD: LinkPathVQA: LinkCXR-LT 2024: LinkISIC 2018: Link
Please download the datasets you need and organize them according to the directory structure expected by:
Training and inference commands are provided in cmd.sh.
Please update local checkpoint and dataset paths in that file before running.
Expected inference outputs are written under ./runs/<exp_name>/:
cls_metrics.jsondir_metrics.jsonvisualization/results/*.json
Inference speed depends on the data, task, and hardware. For example, disease classification inference for CXR-LT 2024 Task 2 (406 images) on a server equipped with one A800 GPU takes 0.24 hours, including model loading and inference.
For questions about the code release or implementation, please contact cheng.li6@siat.ac.cn or ss.wang@siat.ac.cn.
This project is released under the license in LICENSE.
This repository builds on or includes components related to:
HuatuoGPT-VisionQwen2.5-VLSAMIMIS
We gratefully acknowledge the developers and contributors of these publicly available works, as well as the datasets, that have collectively enabled our research.
If you find this repository useful, please consider citing:
@article{biascarevl,
title = {Bias-constrained multimodal intelligence for equitable and reliable clinical AI},
author = {Cheng Li, Weijian Huang, Jiarun Liu, Hao Yang, Qi Yang, Song Wu, Ye Li, Hairong Zheng, Shanshan Wang},
journal = {arXiv:2604.16884},
year = {2026}
}