Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Alignment: Discrepancy Between Language Description and Observations #33

Open
nuomizai opened this issue Jan 4, 2024 · 0 comments

Comments

@nuomizai
Copy link

nuomizai commented Jan 4, 2024

I ran the visualization examples from the code lab with the dataset utaustin_mutex. However, the GIF image I got is different from the language description. For example, I use the following code to extract the observations and the corresponding language instruction from the first episode of utaustin_mutex:

import tensorflow_datasets as tfds
from PIL import Image
from IPython import display
from tqdm import tqdm


def dataset2path(dataset_name):
    if dataset_name == 'robo_net':
        version = '1.0.0'
    elif dataset_name == 'language_table':
        version = '0.0.1'
    else:
        version = '0.1.0'
    # return f'gs://gresearch/robotics/{dataset_name}/{version}'
    return f'~/tensorflow_datasets/{dataset_name}/{version}'



def as_gif(images, path='temp.gif'):
    # Render the images as the gif:
    # images[0].save(path, save_all=True, append_images=images[1:], duration=1000, loop=0)
    images[0].save(path, save_all=True, append_images=images[1:], duration=100, loop=0)
    gif_bytes = open(path, 'rb').read()
    return gif_bytes

_full_dataset = ['utaustin_mutex']

display_key = 'image'

for dataset in tqdm(sorted(_full_dataset), desc="processing dataset"):
    dataset_name = dataset
    b = tfds.builder_from_directory(builder_dir=dataset2path(dataset))
    ds = b.as_dataset(split='train[:1]').shuffle(1)   # take only first 10 episodes
    episode = next(iter(ds))
    images = [step['observation'][display_key] for step in episode['steps']]
    images = [Image.fromarray(image.numpy()) for image in images]
    display.Image(as_gif(images))

    step = next(iter(episode['steps']))
    language_instruction = step['language_instruction']
    language_instruction = language_instruction.numpy().decode("utf-8")
    print(language_instruction)


Then, I got the language description as

Kindly spot and seek the red cup placed ahead of you.
Cautiously adjust your gripper towards the red cup, gripping it gently.
Find the after-storage area of the caddy.
Relocate the red cup in your grip over the rear portion and softly release it into the compartment.

However, in the GIF image, there is only blue cup instead of red one. Is there something wrong with this dataset? Or this problem is related to the visualization code?
temp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant