VIM

A simple implementation of "VIMA: General Robot Manipulation with Multimodal Prompts"

Appreciation

Lucidrains
Agorians

Install

pip install vima

Usage

import torch
from vima import Vima

# Generate a random input sequence
x = torch.randint(0, 256, (1, 1024)).cuda()

# Initialize VIMA model
model = Vima()

# Pass the input sequence through the model
output = model(x)

MultiModal Iteration

Pass in text and and image tensors into vima

import torch
from vima.vima import VimaMultiModal

#usage
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))


model = VimaMultiModal()
output = model(text, img)

License

MIT

Citations

@inproceedings{jiang2023vima,
  title     = {VIMA: General Robot Manipulation with Multimodal Prompts},
  author    = {Yunfan Jiang and Agrim Gupta and Zichen Zhang and Guanzhi Wang and Yongqiang Dou and Yanjun Chen and Li Fei-Fei and Anima Anandkumar and Yuke Zhu and Linxi Fan},
  booktitle = {Fortieth International Conference on Machine Learning},
  year      = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
vima		vima
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agorabanner.png		agorabanner.png
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIM

Appreciation

Install

Usage

MultiModal Iteration

License

Citations

About

Releases

Sponsor this project

Packages

Languages

License

kyegomez/VIMA

Folders and files

Latest commit

History

Repository files navigation

VIM

Appreciation

Install

Usage

MultiModal Iteration

License

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages