-
Notifications
You must be signed in to change notification settings - Fork 828
Open
Description
Thanks very much for releasing such insightful work!
We develop a project based on ImageBind by aligning 3D point cloud modality with image, text, and audio as Point-Bind. Our project exhibits four main characters:
- Align 3D with ImageBind . With a joint embedding space, 3D objects can be aligned with their corresponding 2D images, textual descriptions, and audio.
- 3D LLM via LLaMA-Adapter. In Multi-modal LLaMA-Adapter (ImageBind-LLM), we introduce an LLM following 3D instructions in Engish/中文.
- 3D Zero-shot Classify/Seg/Det . Point-Bind achieves state-of-the-art performance for 3D zero-shot tasks, including classification, segmentation, and detection.
- Embedding Arithmetic with 3D. We observe that 3D features from Point-Bind can be added with other modalities to compose their semantics.
The Multi-modality LLaMA-Adapter (ImageBind-LLM) with Point-Bind's 3D embeddings is as follows:

Thanks!
StanLei52, ZiyuGuo99, Wolfwjs, qingfengcss, tfwang08 and 7 more
Metadata
Metadata
Assignees
Labels
No labels