Skip to content

My simple implementation of "VIMA: General Robot Manipulation with Multimodal Prompts"

License

Notifications You must be signed in to change notification settings

kyegomez/VIMA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Modality

VIM

A simple implementation of "VIMA: General Robot Manipulation with Multimodal Prompts"

Original implementation Link

Appreciation

  • Lucidrains
  • Agorians

Install

pip install vima


Usage

import torch
from vima import Vima

# Generate a random input sequence
x = torch.randint(0, 256, (1, 1024)).cuda()

# Initialize VIMA model
model = Vima()

# Pass the input sequence through the model
output = model(x)

MultiModal Iteration

  • Pass in text and and image tensors into vima
import torch
from vima.vima import VimaMultiModal

#usage
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))


model = VimaMultiModal()
output = model(text, img)

License

MIT

Citations

@inproceedings{jiang2023vima,
  title     = {VIMA: General Robot Manipulation with Multimodal Prompts},
  author    = {Yunfan Jiang and Agrim Gupta and Zichen Zhang and Guanzhi Wang and Yongqiang Dou and Yanjun Chen and Li Fei-Fei and Anima Anandkumar and Yuke Zhu and Linxi Fan},
  booktitle = {Fortieth International Conference on Machine Learning},
  year      = {2023}
}

About

My simple implementation of "VIMA: General Robot Manipulation with Multimodal Prompts"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages