Whether the visual encoder participates in training #95

LoverLost · 2024-07-18T13:26:39Z

In the 3rd stage, following your paper 'Finally, we further perform instruction tuning of the pre-trained model on visual language instruction datasets,' I'm wondering if siglip is also unfrozen and involved in the training during this stage. Or is it only the projector and LLM?

LoverLost · 2024-07-18T13:30:54Z

If possible, I would also like to know which version of llama is used for the LLM of the 3b model?

Lyken17 · 2024-08-14T14:17:23Z

ViT was during vila training.

For 3b, we used shearedllama from princeton.

Lyken17 · 2024-08-14T16:34:20Z

https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B

Lyken17 closed this as completed Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whether the visual encoder participates in training #95

Whether the visual encoder participates in training #95

LoverLost commented Jul 18, 2024

LoverLost commented Jul 18, 2024

Lyken17 commented Aug 14, 2024

Lyken17 commented Aug 14, 2024

Whether the visual encoder participates in training #95

Whether the visual encoder participates in training #95

Comments

LoverLost commented Jul 18, 2024

LoverLost commented Jul 18, 2024

Lyken17 commented Aug 14, 2024

Lyken17 commented Aug 14, 2024