Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advice for running RT-1 in a simple Pick and Place environment #23

Open
FelixHegg opened this issue Nov 18, 2023 · 2 comments
Open

Advice for running RT-1 in a simple Pick and Place environment #23

FelixHegg opened this issue Nov 18, 2023 · 2 comments

Comments

@FelixHegg
Copy link

Hi,
first of all, thank you for open-sourcing RT-1, collating the Open X-embodiment dataset, and releasing trained RT-1 and RT-1-X checkpoints. We are very impressed by the reported capabilities and eager to build on top of this.
We have been trying to use the model on a Franka in our office but moved to a minimalistic pybullet environment for easier experimentation since we are struggling to replicate reasonable behaviors for simple tasks. We tried finding a camera frame and a world frame in which interpreting the action as a position delta seems reasonable, but to no avail.

Our question is this: Do you have advice based on your own experiments for running inference on out-of-distribution settings, particularly regarding the following decisions:

  • What is a recommended coordinate frame to interpret the delta positions? Which direction should be X, Y and Z?

  • We observed that the model behaves differently when choosing different camera positions. What camera position could you recommend for the beginning?

  • What is a reasonable starting point for denormalization of the actions for the Franka Panda robot? Currently we use the same values as in the bridge dataset.

Is it expected that one would first need to do some fine-tuning to align the model to a particular action space in an unseen setting? Note that we currently only try to pick up a simple cube, which, we believe, should be within the model's capabilities.
Kind regards,
Felix

@joeljang
Copy link

Hi, I am wondering if there are any updates regarding this as well!

@kpertsch
Copy link

Hi Felix and Joel,

Sorry for the delayed reply!
The current RT-1-X model, without finetuning, would only be expected to work well in settings from the training dataset, e.g. a reproduction of the BridgeV2 setup could work out of the box.
It is unlikely the model would work well 0-shot on visually very different environments like the Franka sim environment you are describing (for one, it is not conditioned on action space information, so it's hard to predict what action space it would choose to output actions in).

There will hopefully be a release of the RT-1-X Jax code soon that should make it easier to finetune the pre-trained checkpoint, which should help a lot with adapting to a new domain.

In the meantime, if you want to get started with some finetuning experimentation you can take a look at the Octo model we recently released which has some example scripts for finetuning to new domains and should hopefully work well on your Franka setup (we have tested finetuning to 4 different Franka setups across UC Berkeley, Stanford and CMU)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants