Skip to content

longkukuhi/RoboLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official PyTorch implementation and pretrained models for paper: "RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models."

Updates

  • [2024/01/31] Accepted to ICRA 2024!
  • [2024/02/15] Updated fine-tuned checkpoints for the identification and defection detection tasks.

Setup

Download pre-trained Checkpoints

  • BEiT3-base: #layer=12; hidden=768; FFN factor=4x; #head=12; patch=16x16; #parameters: 222M
  • BEiT3-large: #layer=24; hidden=1024; FFN factor=4x; #head=16; patch=16x16; #parameters: 674M

Download Text Tokenizer

beit3.spm is the sentencepiece model used for tokenizing texts.

from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer("/your_beit3_model_path/beit3.spm")

Set up the environment

alias=`whoami | cut -d'.' -f2`; docker run -it --rm --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel bash

install required packages:

pip install -r requirements.txt

Download our preprossed json files

For the Armbench identification task

Download this file and put them into the armbench dataset dir. json_files.zip

Additional json files for 3to1 task. ID_json_3t1.zip

(Optional) Download our fine-tuned checkpoints

For the Armbench identification task

RoboLLM Base whole gallary

RoboLLM Base within basket

For the Armbench defection detection task

RoboLLM Base

RoboLLM Large

Object Identification

python armbench/ID.py --model 'beit3_base_patch16_224' --input_size 224 --task 'armbenchpick1to1' --batch_size 128 \
 --layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
 --data_path 'path/to/your/dataset' --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05  \
 --save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'
  • model specifics the name of model we use in this experiments.
  • log_dir is the folder dir that stores the ouput log.
  • task specifics using armbenchpick1to1 for only use pre-pick images, armbench3t1 for use both pre-pick and post-pick images.
  • data_path is the folder dir that stores the datasets.
  • finetune specifics the dir to pretrained weight of BEiT-3 model.

Defect Detection

python armbench/defection_id.py --model 'beit3_base_patch16_224' --input_size 224 --task 'defection1by1' --batch_size 128 \
 --layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
 --data_path 'path/to/your/dataset'  --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05  \
 --save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'
  • model specifics the name of model we use in this experiments.
  • log_dir is the folder dir that stores the ouput log.
  • task specifics for defect detection.
  • data_path is the folder dir that stores the datasets.
  • finetune specifics the dir to pretrained weight of BEiT-3 model.

Citation

If you find this repository useful, please consider citing our work:

@misc{long2023robollm,
      title={RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models}, 
      author={Zijun Long and George Killick and Richard McCreadie and Gerardo Aragon Camarasa},
      year={2023},
      eprint={2310.10221},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

Acknowledgement

This repository is built using the BEiT, the BEiTv2, the BEiTv3, the CLIP, the open_clip, the Oscar, the DeiT, the Dino repository and the timm library.

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree.

Microsoft Open Source Code of Conduct

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages