Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whether the visual encoder participates in training #95

Closed
LoverLost opened this issue Jul 18, 2024 · 3 comments
Closed

Whether the visual encoder participates in training #95

LoverLost opened this issue Jul 18, 2024 · 3 comments

Comments

@LoverLost
Copy link

In the 3rd stage, following your paper 'Finally, we further perform instruction tuning of the pre-trained model on visual language instruction datasets,' I'm wondering if siglip is also unfrozen and involved in the training during this stage. Or is it only the projector and LLM?

@LoverLost
Copy link
Author

If possible, I would also like to know which version of llama is used for the LLM of the 3b model?

@Lyken17
Copy link
Collaborator

Lyken17 commented Aug 14, 2024

ViT was during vila training.

For 3b, we used shearedllama from princeton.

@Lyken17
Copy link
Collaborator

Lyken17 commented Aug 14, 2024

@Lyken17 Lyken17 closed this as completed Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants