Skip to content

How to use a non-timm pretrained image tower with a HF pretrained text tower? #543

Answered by rwightman
bryant1410 asked this question in Q&A
Discussion options

You must be logged in to vote

@bryant1410 there's no built in support to load existing CLIP trained weights for image tower into a new model with a different text encoder. There were some PR's to add support but they didn't get merged due to being combined with lots of other changes or got lost in the shuffle, would be good to add at some point.

To do this right now you'd have to hack some code to load the image weights, the timm pretrained flag just passes down to timm to load pretrained (imagenet) weights.

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by bryant1410
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants