Code for the paper "Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI" Our website is in ManipMob-MMKG
Given the scene profiles, we design a prompt-based schema based on LLMs and then populate multimodal knowledge guided by the schema to construct our Scene-MMKG. Scene-MMKG is refined by hierarchicalization and aggregation for attributes to resolve long-tail problems.
The overview of the scene-driven knowledge enhancement model is shown in the left panel. The right panel is the details about scene knowledge retrieval module.
For three sub-works, please visit each subfolder for information such as environmental requirements, data downloads, and run steps.
- Scene MMKG Construct: about the specific steps for the construction of scene mmkg.
- Visual Language Navigation ManiMob: about the application of ManiMob-MMKG inject into VLN task.
- 3D Object Language Grounding ManiMob: about the application of ManiMob-MMKG inject into 3D Object Language Grounding task.
The code of two downstream tasks in this work is built upon CKR and snare. And thanks them for their great works!
@misc{yaoxian2023scenedriven,
title={Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI},
author={Song Yaoxian and Sun Penglei and Liu Haoyu and Li Zhixu and Song Wei and Xiao Yanghua and Zhou Xiaofang},
year={2023},
eprint={2311.03783},
archivePrefix={arXiv},
primaryClass={cs.AI}
}