Finetune with image goal. #74

zwbx · 2024-04-12T09:55:21Z

          Thanks for the question! We use `task_stack_keys`as a mechanism to do goal-image conditioning.

The image tokenizer roughly implements the following logic:

inputs = jnp.concatenate(
     [observations[k] for k in obs_stack_keys] + 
     [tasks[k] for k in task_stack_keys],
     axis=-1
)
tokens = encoder(inputs)

So, when you configure the tokenizer this way

"primary": ModuleSpec.create(
            ImageTokenizer,
            obs_stack_keys=["image_primary"],
            task_stack_keys=["image_primary"],
            encoder=ModuleSpec.create(SmallStem16),
        ),

Inside the tokenizer, the "image_primary" key is extracted from the "observations" dictionary, the "image_primary" key is extracted from the tasks dictionary, and the two are concatenated channel-wise, before being passed into the conv layers. This is known as early-goal fusion, and means that from the very beginning of the network, the model can do pixel-wise comparisons between the camera view at the current timestep and the desired goal camera view (a typically useful inductive bias for goal-reaching tasks).

If you don't care about goal-image task conditioning (e.g. you only want language-conditioned training), then you should simply omit the task_stack_keys argument (same if you want to do goal-image conditioning, but would prefer to separately encode / tokenized the goal image and the current observation).

In any case, what is happening in your current code is that the config is expecting a goal image corresponding to "image_primary" in tasks["image_primary"], is not finding it in the tasks dictionary, and choosing to just insert a black image in its place (effectively a no-op).

Originally posted by @dibyaghosh in #25 (comment)

The text was updated successfully, but these errors were encountered:

zwbx · 2024-04-12T09:56:00Z

Hi, I check through the code and do not find the way to load image goal in dataset loading part. It seems not to have been implemented yet.

kpertsch · 2024-04-12T21:50:23Z

Image goals are being loaded and are returned as part of the task dictionary from the data loader.
See here:

octo/octo/data/dataset.py

Line 97 in bd930f9

dataset = dataset.traj_map(

zwbx · 2024-04-15T03:08:34Z

Thanks to this, I was able to successfully train the model using an image goal. However, I'm not sure if I'm performing inference with the image goal correctly. During inference, we don't actually have the future image goal. What type of image goal should we use then? Should it be a one selected from the training set? (Here the train and test sets are defined as variations from the same task and scene.)

kpertsch · 2024-04-16T14:22:21Z

If you want to evaluate a policy with image goal specification, you need to collect a goal image for your evaluation task. We usually collect this image right before running the evaluation to make sure it's in-distribution with your current scene layout.

zwbx · 2024-04-17T05:42:21Z

Thanks! could do explain it in more details. Considering the image goal early fusion strategy, I guess the model is sensitive to the alignment of image goal and the testing scene. I'm curious about the degree of alignment necessary. Does this scenario fulfill the alignment requirement if image goal and test sample involve the same task, in the same scene, targeting the same object, but with the object in a different location?

kpertsch · 2024-04-18T13:47:33Z

During training we have always used future images from the same trajectory as goals, so the model likely requires the goal image to use the same object positions

Adding MSE and MAP decoding

zwbx changed the title ~~Thanks for the question! We use task_stack_keysas a mechanism to do goal-image conditioning.~~ Finetune with image goal. Apr 12, 2024

WenchangGaoT pushed a commit to WenchangGaoT/octo1 that referenced this issue May 10, 2024

Merge pull request octo-models#74 from rail-berkeley/dibya-add-mse

7bac65d

Adding MSE and MAP decoding

zwbx closed this as completed Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune with image goal. #74

Finetune with image goal. #74

zwbx commented Apr 12, 2024

zwbx commented Apr 12, 2024

kpertsch commented Apr 12, 2024

zwbx commented Apr 15, 2024

kpertsch commented Apr 16, 2024

zwbx commented Apr 17, 2024

kpertsch commented Apr 18, 2024

Finetune with image goal. #74

Finetune with image goal. #74

Comments

zwbx commented Apr 12, 2024

zwbx commented Apr 12, 2024

kpertsch commented Apr 12, 2024

zwbx commented Apr 15, 2024

kpertsch commented Apr 16, 2024

zwbx commented Apr 17, 2024

kpertsch commented Apr 18, 2024