2 OpenGVLab,Shanghai AI Laboratory
* Equal Contribution † Corresponding Author
Point as A Foreign Language, Let Large Language Models (LLMs) Perceive 3D Physical World as Reading Articles!
- 2023.7.31: Github Repository Initialization. The paper will be released very soon.
We propose to utilize pretrained language models for point cloud understanding. Differernt from existing methods leveraging image as intermediate, we found that language models can read point clouds as a foreign language. Benefit from pretraining on the large-scalle corpus, language models performs better in long-tailed and out-of-distibution tasks in 3D vision area.
- 💡 For multimodal research, our method explores the underlying representation relationship between different modalities, specifically, language and 3D point cloud, and demonstrates that models pretrained on natural language can read 3D point clouds.
- 💡 For 3D vision research, our method performs end-to-end point cloud understanding without hand-crafted structure designs. And it also demonstrates the feasibility of using natural corpus text as pretraining data for 3D vision.
- 💡 For the vision-language area, our method experimentally validates that 3D point clouds and text can be encoded by the same parameters. A new promising direction appears for the tasks involving modality alignment between text and point clouds.
- 💡 With outstanding performance across benchmarks including ModelNet-40, S3DIS, and ShapeNetPart, our method demonstrates its effectiveness on both coarse-grained and fine-grained 3D point cloud tasks.
- Support Billion-scale Large Language Models.
- Large Language Model with More Modalities.
- Support Outdoor LiDAR Scenes.
If you are interested in this project, welcome to contribute to our project!
To contact us, you can send an email to yiyuanzhang.ai@gmail.com ,kaixionggong@gmail.com, or xyyue@ie.cuhk.edu.hk!
This project is released under the Apache 2.0 license.
This code is developed based on an excellent open-sourced project OpenPoints.
