You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for the late reply, I'm too busy to respond. Let me answer your questions one by one:
yes, I did; you can find the finetuning configuration in the appendix of the paper. and I followed the same finetuning procedure as instructblip while removing the text input of qformer.
I have also shared the prompts I used in the appendix of the paper. The format is exactly same as that I used in experiments. I do not expect an original instructblip model does work before it is fine-tuned and aligned with environment dynamics via our proposed dagger-dpo algorithm (algo. 1 in the paper).
I have released the code for dagger-dpo, please refer to this
Thanks for sharing your great work!
I have a few questions about your work, especially regarding the baselines.
Also, I am looking forward to your code release!
The text was updated successfully, but these errors were encountered: