Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use the repo? #13

Open
rahuldeo2047 opened this issue Sep 8, 2023 · 1 comment
Open

How to use the repo? #13

rahuldeo2047 opened this issue Sep 8, 2023 · 1 comment

Comments

@rahuldeo2047
Copy link

rahuldeo2047 commented Sep 8, 2023

Hello,

I hope this message finds you well. I want to express my gratitude for providing the repository; it has been immensely helpful in enabling me to successfully execute the example.py script.

Furthermore, I have thoroughly reviewed the associated paper, which has given me a solid understanding of the project's context. However, I now have a few queries regarding the practical usage of the repository.

I have successfully managed to work with video and text inputs, but I am a bit unsure about how to incorporate the "RT-2" component, which is designed for Video-Language-Action interaction. I might be overlooking something, and I'd appreciate any guidance or clarification you could provide in this regard.

Additionally, while I've been able to obtain results using video and text inputs, I would greatly appreciate some clarification on the interpretation of these results. If you could shed some light on the meaning or implications of these outcomes, it would be immensely helpful.

Thank you very much for your assistance, and I look forward to your response.

Best regards,

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@rahuldeo2047
Copy link
Author

For detail:

import torch 
from rt2.model import RT2

model = RT2()

video = torch.randn(2, 3, 6, 224, 224)

instructions = [
    'bring me that apple sitting on the table',
    'please pass the butter'
]

# compute the train logits
train_logits = model.train(video, instructions)

# set the model to evaluation mode
model.model.eval()

# compute the eval logits with a conditional scale of 3
eval_logits = model.eval(video, instructions, cond_scale=3.)

How to interpret eval_logits for further use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant