The MedShapeNet foundation model is the first multi-modal foundation model for medical point cloud completion and serves as a foundation for future research in this area. It is designed to handle incomplete 3D point cloud data and reconstruct the full shape of various medical structures. By combining both 3D point cloud data and textual data, this model enhances accuracy in shape reconstruction, supporting more precise analysis and potential applications in extended reality (XR) for medicine and custom bone implant design.
- Transformer-Based Autoencoder** for efficient feature extraction and 3D point cloud completion.
- Multi-Modal Integration** of text data using BERT to enhance point cloud reconstruction.
- Density-Aware Chamfer Distance Loss** tailored for handling varying point densities.
Dataset: MedShapeNet.
The model is trained on the MedShapeNet dataset, a comprehensive collection of over 100,000 3D medical shapes. This dataset encompasses a wide range of medical structures, including organs, vessels, bones, instruments, and more, spanning across 240 distinct classes.
To create a robust training set for our model:
- We extracted point clouds from the vertices of each 3D mesh file in MedShapeNet.
- Point extactiom is explained in detail here
- To simulate real-world scenarios where data might be incomplete, we introduced defects by removing points from each point cloud. This created an "incomplete" input that the model aims to reconstruct.
- Each point cloud was processed twice in this way, generating a total of 200,000 point clouds for our dataset. 90% was designated for training and 10% for validation.
- Defect injection and augmentation are explained here
To enhance the model's interpretative ability, we provided class names as textual input. This allows the model to differentiate between classes, such as distinguishing a healthy liver from a tumorous liver, adding a layer of semantic understanding to its point cloud completion.
The MedShapeNet foundation model demonstrates the potential of multi-modal learning in medical applications, bridging 3D shape data with textual descriptors to improve the quality and accuracy of shape completion in medical imaging.
- A small demo dataset of 220 point clouds have been preprocessed and can be used directly with the model for inference.
- This a Notebook on how to build and train the model.
- This a Notebook on how to use the model to run inference and visualize the results.
the model weights are available here.
The paper can be accessed here, and on ResearchGate





