Replies: 1 comment
-
Personally, I'd check out how google made / trained the ViT-B/16 or ViT-B/32 models that CLIP in this code is setup to use. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Not sure if I'm using the right lingo, but is there a way to see the words/phrases the model knows? So I can better write the words for it to describe the image.
Beta Was this translation helpful? Give feedback.
All reactions