Official PyTorch implementation and pretrained models for paper: "RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models."
- [2024/01/31] Accepted to ICRA 2024!
- [2024/02/15] Updated fine-tuned checkpoints for the identification and defection detection tasks.
BEiT3-base
: #layer=12; hidden=768; FFN factor=4x; #head=12; patch=16x16; #parameters: 222MBEiT3-large
: #layer=24; hidden=1024; FFN factor=4x; #head=16; patch=16x16; #parameters: 674M
beit3.spm is the sentencepiece model used for tokenizing texts.
from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer("/your_beit3_model_path/beit3.spm")
alias=`whoami | cut -d'.' -f2`; docker run -it --rm --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel bash
install required packages:
pip install -r requirements.txt
Download this file and put them into the armbench dataset dir. json_files.zip
Additional json files for 3to1 task. ID_json_3t1.zip
python armbench/ID.py --model 'beit3_base_patch16_224' --input_size 224 --task 'armbenchpick1to1' --batch_size 128 \
--layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
--data_path 'path/to/your/dataset' --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05 \
--save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'
model
specifics the name of model we use in this experiments.log_dir
is the folder dir that stores the ouput log.task
specifics using armbenchpick1to1 for only use pre-pick images, armbench3t1 for use both pre-pick and post-pick images.data_path
is the folder dir that stores the datasets.finetune
specifics the dir to pretrained weight of BEiT-3 model.
python armbench/defection_id.py --model 'beit3_base_patch16_224' --input_size 224 --task 'defection1by1' --batch_size 128 \
--layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
--data_path 'path/to/your/dataset' --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05 \
--save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'
model
specifics the name of model we use in this experiments.log_dir
is the folder dir that stores the ouput log.task
specifics for defect detection.data_path
is the folder dir that stores the datasets.finetune
specifics the dir to pretrained weight of BEiT-3 model.
If you find this repository useful, please consider citing our work:
@misc{long2023robollm,
title={RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models},
author={Zijun Long and George Killick and Richard McCreadie and Gerardo Aragon Camarasa},
year={2023},
eprint={2310.10221},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
This repository is built using the BEiT, the BEiTv2, the BEiTv3, the CLIP, the open_clip, the Oscar, the DeiT, the Dino repository and the timm library.
This project is licensed under the license found in the LICENSE file in the root directory of this source tree.